Wormpath - User's Guide

Getting started with Wormpath
The purpose of Wormpath is to search a user-provided list of genetic markers of Caenorhabditis elegans for genetic interactions and find genetic networks formed by these interactions. It is organized as a web-service hosted by the Bioinformatics Core Facility at CECAD Cologne, Germany. Typically, the uploaded list origins from an RNA-Seq study or a genome-wide scan using microarray hybridization for cDNAs from different conditions, one with and one without treatment. The results of the search are provided to the user in an XML-based summary report for easy navigation and exploration.

To start a run of the Wormpath software, you need a list of C. elegans genes given as Wormbase identifiers or microarray markers using Affymetrix IDs. These need to be arranged in the first column of a two-column table. In the second column, the table should contain a measure of evidence for that gene, e. g. a fold change or p-value for a comparison of the expression of that gene between two entities.

You can start the run by uploading this file to the start website, choosing the format in which the table is given, the method by which evidence for each gene is quantified (p-value or fold-change) and the core options for the run, namely, whether indirect interactions (see below) should be included in the search and how many iterations are supposed to be used. Finally, specify a level of significance to limit the resulting networks to those which are statistically significant. You probably best switch this to a value of 1 for a first try. To submit your input and start the analysis, click the button and wait. The duration of the analysis depends on the size and complexity of your gene list. The logger in your browser window will keep you up-to-date what's going on. When the analysis finishes, click one of the two links to view the results or download them to your computer.

Input format
The input list to the Wormpath software needs to contain 2 columns - one for the genes to search and one with a level of evidence for each gene. For details, please refer to the sections below. As file formats, Microsoft Excel spreadsheets (*.xls, not *.xlsx) and plain text files (ASCII) are accepted. If text files are provided, a tab character has to be used as field delimiter. For both formats, the file may contain only 2 columns without header line. If given as an Excel file, only the first spreadsheet should contain data. Apart from the two columns needed, no other data should be contained in your file to avoid the software being confused by this.

The first column may either contain the genes as a list of Affymetrix or Agilent IDs from the respective C. elegans gene expression microarray, the corresponding Wormbase IDs formated like WBGene00123456 or the sequence names used as accession numbers to the sequence data of the respective gene in the Wormbase. These are typically shaped like F52B5.5.

Experimental evidence for the respective gene has to be included as the second column of the input file and can be given as either the ratio of the expression levels of the two entities being compared or as a fold change value, where the ratio is replaced by the negative reciprocal ratio for downregulated genes. Finally, a p-value from a standard t-test may be used which, of note, does not provide information on the direction of the regulation.

Predicted and indirect interactions
Interactions in the Wormbase are classified as "Genetic", "Suppressing", "Predicted" etc., according to the scientific evidence for that interaction. Reviewing the number of interactions that are classified as the one or other, it becomes clear that most interactions in the Wormbase are "Predicted" while there is only very weak evidence for these. Depending on the length of the gene list that is uploaded, the presence of predicted interactions will more or less confuse the analysis and output. We have therefore decided that predicted interactions are generally excluded from the analysis but listed in the output where all interactions of a certain gene are shown.

If two genes in the genelist do not have a direct interaction in the Wormbase but both interact with a third gene that is not differentially regulated, we call this an indirect interaction. Its importance is due to genes that might have been missed in the analysis but play an important role in the network of the others. There is an option to specify whether or not the (unregulated) genes inducing such indirect interactions should be involved in the graph and included in the search for networks.

Iteration depth
This is the core parameter for modification of the behaviour of the analysis algorithm. In the basic graph underlying the analysis, this gives you the highest number of steps which are performed to search a network surrounding a specific gene. That is, if the iteration depth is 3, networks surrounding a particular gene are formed by neighbors that are 1, 2 or at most 3 steps apart. The software also searches for smaller networks because this can improve visibility of the closer environment.

On the other hand, the whole analysis is always repeated with any smaller maximum iteration depth, say for 3, 2, and 1, if you choose 4, because this gives you the chance to review the shorter and potentially handyier result lists for these values.

Analysis results
The results can either directly be accessed in your webbrowser or be downloaded as a gzipped archive. If your iteration depth was 3, you should first review the analysis results provided in the XML file summary3.xml. The files summary1.xml and summary2.xml contain the lists for only 1 or 2 iterations chosen. In the summary3.xml report, you can access these by clicking the Show smaller networks link on top of the page.

This gives you a summary of genetic networks contained in the list uploaded. You can show the interactions of a network following the respective Show Interactions link. This also gives you the citation to the paper in which the interaction was established. Furthermore, you can show a graphical representation of the respective network by clicking the Show Graph link in the summary report. To reduce the number of neighbors in the figure, you can click the Exclude nodes link which will show you a sub-network with a smaller number of genes. Following the links in the Show neighbours of ... box for any gene in that network, you can show a new network centered around this molecule containing all genes that are a controlled number of steps apart. On the other hand, you can show details on any interaction involved in the network by clicking the links in the Show all interactions of ... box. When you are browsing the summary report, at any point there are links provided directly connecting you to the corresponding information on the Wormbase and Pubmed websites.

Statistical interpretation
For each network in the summary report, statistical evidence is given by a score and two p-values. The score simply represents the average number of papers describing interactions of the respective network, whereas the score-based p-value reflects the probability to get this or a larger score for a network of this size by chance alone. The list-based p-value reflects the probability that in a larger environment of the network, the number of differentially expressed genes in the network is by chance at least as large as the one observed.

The Significance level switch may be used to reduce the list of results reported to only those genetic networks which show significance at the given level. To report all the resulting networks, this switch can simply be left at a value of 1.

Modify graphical output
To edit the graphics available from the Show Graph reports by hand after the analysis has finished, you can click Open image as vector graphic and edit the SVG-format graphics with a tool suitable for vector graphic modification, e. g. the free software Inkscape or Corel Draw. You should set the tool of your choice as the standard application for SVG vector graphics.

Furthermore, you can save any graph in the XGMML format using the Save in XGMML format links. This is an XML-based format accepted by most softwares dedicated to graph vizualisation and interpretation. For example, you can import the XGMML files into Cytoscape and use the fully developed functionality for graph layout in this software.