Using the Pathway Tools Omics Viewer
Note: The Pathway Tools Omics Viewer was formerly known as the Pathway Tools Expression Viewer. It has recently been extended to display many other types of experimental data, not just gene expression data, so has been renamed.The Pathway Tools Omics Viewer uses the Metabolic Overview for an organism to illustrate the results of high-throughput experiments in a global metabolic pathway context. Genes (in the case of a gene expression experiment) and proteins (in the case of a proteomics experiment) that are involved in metabolism are mapped to reaction steps in the Metabolic Overview, and the range of data values levels in a given experimental dataset is mapped to a spectrum of colors. Reaction steps in the Metabolic Overview are colored according to the corresponding data value. Similarly, for metabolomics experiments, compound nodes are colored according to the data value for the corresponding compound. This facility enables the user to see instantly which pathways are active or inactive under some set of experimental conditions.
The Omics Viewer can be used for:
- Microarray Expression Data: Reaction lines (and protein icons, where present) are color-coded according to the relative or absolute expression level of the gene that codes for the enzyme that catalyzes that reaction step. The Omics Viewer allows a scientist to interpret the results of gene-expression experiments in a pathway context.
- Proteomics Data: Reaction lines (and protein icons, where present) are color-coded according to the concentration of the enzyme that catalyzes that reaction step.
- Metabolomics Data: Compound icons are color-coded according to the concentration of the compound.
- Reaction Flux Data: Reaction lines are color-coded according to reaction flux values.
- Other Experimental Data: Any experiment, high-throughput or otherwise, in which data values are assigned to genes, proteins, reactions or metabolites can be viewed in a pathway context using the Omics Viewer.
The Omics Viewer can show absolute data values (such as the concentration of a metabolite or protein, or the absolute expression level of a gene), or it can be used to compare two sets of experimental data by computing a ratio and mapping the ratios onto a color spectrum.
The superposition of multiple sets of experimental data on the metabolic overview can also be animated to show, for example, how gene expression levels of enzymes change with time over the course of an experiment.
Examples
Single gene expression experiment: | Sample datafile and brief description | Sample display |
Time series metabolomics animation: | Sample datafile and brief description | Sample display |
Note that if your browser permits popups from this site, the links to the sample displays will also pop up a new browser window or tab showing the genome overview. The sample displays are being generated upon request, so may take several minutes. In order to minimize generation time, the sample animation display shows only 6 (time points 10-15) of the 17 time points included in the sample data file.
Omics Dataset File Format
Experimental data is imported from a file provided by the user that is stored on the user's computer. Each line of the file contains data for a single gene, protein, reaction or metabolite, and is of the form:<name-or-ID> <data-column1>...<data-columnN>Columns are separated by the tab character. Lines that start with
#
or ;
are taken to be comment
lines and are ignored by the program.
<name-or-ID> can be either a common name for an object (the BioCyc data typically includes extensive synonym lists, and every attempt is made to match a name to the appropriate target), or the BioCyc internal ID for the object. Gene IDs from sequencing projects (such as the E. coli B-numbers) are generally acceptable and unambiguous. For protein or reaction data, EC numbers may be used. You must specify whether the entities in the <name-or-ID> column are genes, proteins, reactions, compounds, or a mixture.
The numbers in the data columns can represent either absolute or relative values. If the data values represent absolute numbers, you may choose to visualize either a single column of absolute data values (select "Absolute" and one data column), or the ratio of two data columns as relative data values (select "Relative" and two data columns). If the data values themselves represent relative numbers, then you need supply only a single column number, and select "Relative". An entry (a row of data for a gene or other object) may contain any number of data columns (for example, if you wish to compile measurements from several experiments or time points into a single file), but only those data columns specified will be visualized at a time -- all other columns will be ignored.
Color Scales
The color scale used depends on the type and, by default, the range of the data. Thus, a particular color may correspond to one gene expression level for one dataset, and a different gene expression level for another dataset, depending on the range of values or the supplied maximum cutoff value for each dataset. We use the spectrum from yellow/green to red, with yellow representing the lowest expression levels or ratios in the dataset, blue representing values in the middle, and red representing the highest values. Reactions for which no data was provided are drawn in black. The legend for mapping colors to data values is shown in the key, which is drawn to the right of the overview for a single experiment, or to the left for an animation.A maximum cutoff value is chosen. By default, this is computed from the data. Alternatively, the user may supply a maximum cutoff value to use. Supplying the same maximum cutoff value for multiple experiments ensures that the same color scale is used for each one, so that the displays are directly comparable.
The minimum cutoff value is determined based on the maximum cutoff value and the other parameters. For absolute data values, we use a minimum cutoff value of zero. For relative data values that are not logs, we use the inverse of the maximum cutoff. For relative data values that are logs, we use the negative of the maximum cutoff. The color spectrum is then mapped evenly along a log scale between the maximum cutoff and the minimum cutoff.
In many cases, several genes or proteins, each with their own expression level or concentration, will map to a single reaction. This is because the reaction might be catalyzed by an enzyme complex made up of several gene products, or the reaction might be catalyzed by several isozymes, each with its own gene or genes. Since a reaction can only be colored a single color, we must choose which data value to use. For absolute data values, we choose the maximum. For relative data values, we choose the value whose log has the greatest deviation from zero, under the assumption that the user is primarily interested in identifying the entities whose behavior differ most between the two datasets.
Omics Viewer Results
After you submit your dataset to the Pathway Tools, the Omics Viewer returns several results:- The Overview Diagram, colorized with experimental data.
- The color key for the Overview.
- For single experiments, some basic statistics computed from the data file. The program counts and lists gene/protein/metabolite names that could not be resolved, or for which data was missing or malformed. Since, for example, not all genes will code for enzymes, and therefore not all will correspond to reactions in the Metabolic Overview, we compile separate statistics for only those that are represented in the Overview and for the dataset as a whole. The statistics that we compute and tabulate are: number of values, minimum, maximum and median values, and mean and standard deviation of the natural logs of the values. These statistics are not computed when generating animations
- A histogram that shows the distribution of values in the dataset. This histogram is displayed directly beneath the color key. The data value range is divided into 50 intervals, using the same criteria that we use for assigning colors. The number of data values in each interval is shown on the histogram, colored appropriately. To the left of the vertical axis is the histogram for the entities that are represented in the overview. To the right of the axis is the histogram for all other entities in the dataset.
Animation Controls
A time series can be displayed as an animation by specifying multiple data column numbers. The result will be a Dynamic HTML page that initially plays the animation in a continuous loop, showing how the experimental values and histogram change with each experiment. Four buttons control the animation. They can be used to stop and restart the animation, and step through the individual time points.- Stop the animation at the current time point
- Start playing the animation from the current time point
- Go back one time point
- Go forward one time point
Note that older browsers that do not support Dynamic HTML will not be able to run the animation.