SAMNetWeb Tutorial
SAMNet uses a constrained optimization approach to identify relationships between high-throughput mRNA expression data with other measured changes (i.e. genetic hits, phosphoproteomics). The result is a compact network with edges representing protein interactions that best explain the changes in mRNA expression dowstream of the upstream changes for each condition. DAVID automatically scans the resulting network to identify pathways that are enriched by the joint dataset.
INPUTS
SAMNetWeb is a very flexible platform that can be run in different modes. Minimum input required to run the algorithm includes a set of 'Source Weights' for each experimental condition or 'commodity' in the flow formulation and a set of 'Sink Weights' for each condition. In the EMT example highlighted in [1] and shown here, Sink weights are mRNA differentially expressed upon EMT induction and Source weights are proteins with post-translational modifications in each condition.However, it is also possible to run SAMNetWeb without mRNA expression data, using protein changes as sink weights (e.g. when source weights are genetic hits). In this case, no TF-DNA interaction network is needed and the "Use sink weight directly" box must be checked.
Specific formatting of each file is described below:
- HIPPIE: weighted human protein-protein interaction network
- IRefWeb: selected human protein-protein interactions
- InWeb: protein and gene interactions
It is recommended to use one of the pre-formatted published protein-protein interaction networks:
If you would like to create your own protein-protein interaction network, you may create a weighted DiGraph using the NetworkX package, save it in Python Pickle format and upload it. We recommend using UNIPROT protein entry names.
Each mRNA is associated with a commodity and a weight representing the importance of that mRNA in the network. While these values usually represent differential expression, they can also be significance or even correlation with a phenotype of interest. For details see Gosline et al[1].
If using our provided TF-DNA interactions, genes must be in HUGO gene name format. A sample sink weights file is available here, where the weight is set as the correlation value between the mRNA expression and their associated microRNA (commodity) expression levels across 849 breast cancer patients from[2].
It is also possible to assign weights directly to proteins without adding TF-DNA interactions . If so, these weights should be annotated with UNIPROT Entry Names and you must check the "Use sink weights directly" checkbox.
A template for the input format is available below:
-
Commodity1 mRNA1 Weight
Commodity1 mRNA2 Weight
Commodity2 mRNA1 Weight
Commodity2 mRNA3 Weight
...
...
The second input file requires a set of proteins of interest from each experiment ('commodity') be annotated by a weight signifying its importance in the experiment. For details see Gosline et al. Sample protein weights are provided here, where the weight is set as the TargetScan context+ score for the particular protein and its associated microRNA (commodity). SAMNet takes the absolute value of any negative numbers.
A template for the input format is available below:
-
Commodity1 Protein1 Weight
Commodity1 Protein2 Weight
Commodity2 Protein1 Weight
Commodity2 Protein3 Weight
...
...
Please note: The naming of the protein nodes in the protein weights input file must be consistent with the naming of the protein nodes in the interactome, which is UNIPROT Entry Names for the provided interactomes. If you provide HUGO/HGNC gene names they will be converted automatically. If you upload an interactome with different identifiers, you can convert your lists to standard formats using DAVID, or HUGO.
- ENCODE Dnase I cluster, 2kb window around gene trancsription start site
- MCF7 DNase I hypersensitive sites, 2kb window around gene transcription start site
- A549 Dnase I hypersensitive sites, 2kb window around each gene transcription start site
- Tab-delimited text file
- The first column is the protein (transcription factor) interacting with the mRNA in the second column. The third column is a weight assigned to weigh the protein-mRNA interaction.
If the sink weights represent mRNA expression data, we suggest using a protein-DNA interaction network to connect these relevant changes to the protein-interaction network. As such, we have compiled a set of protein-DNA predicted interactions using TRANSFAC motif scanning of ENCODE provided regions for various cell types. We recommend using one of these provided datasets:
If you provide your own edge weights file, you can provide a NetworkX graph object in Python Pickle format or use the following format:
Tab-delimited text file template:
-
Protein1 mRNA1 Weight
Protein1 mRNA2 Weight
Protein2 mRNA1 Weight
Protein2 mRNA3 Weight
...
...
Please note: The naming of the protein nodes in the edge weights file must be consistent with the naming of the protein nodes in the protein weights input file. You can convert your lists to standard formats using DAVID, or HUGO.
The SAMNet algorithm has one parameter, gamma, that controls the number of source weights included in the network. The default value of gamma is 14 but we recommend tuning the parameter to get the best distint GO terms in the DAVID analysis performed on the network
The SAMNet algorithm typically sets all edge capacities to 1. However, to make the network more compact we have implemented a hierarchical reduction of capacities of edges that are farther from the source. Specifically, when this flag is set, capacities are 1 x 10e(-1 x shortest dist to source) for each edge in the network.
OUTPUT DETAILS
Press 'Submit Job' and your job will be submitted to a queue where it will be run sequentially with other jobs. You will be directed to a unique URL where your output will be posted when it is ready. You may provide an e-mail address if you would like to receive a notification when the results are ready. The state of the queue can be viewed here
SAMNetWeb runs are proportional to network size and usually run in 5-20 minutes, but can take up to an hour.
A web link is provided at the time of data submission, which you can bookmark and access at a later time. If you provide an email address, you will be notified when your job is complete.
The result page (sample shown here) includes a basic visualization of the optimal SAMNet interactome using the Cytoscape Web plug-in [3]. This visualization is provided to give users a quick look before they download output files.
- Edge flow values
- Edge types
- Node flow values
- Node types
All output from the SAMNet algorithm can be downloaded ofr further analysis. Tab-delimited files are available to show which inputs were found in the interactome including the Source (S1) and Sink (T1) nodes, that were added to the original network. Cytoscape files (graph structure and node/edge attributes) are available for more detailed analysis:
MORE DETAILS
More detailed information and descriptions about the SAMNet method and its application in identifying key mediators of human epithelial-mesenchymal are available in ref [1].
CONTACT
Any questions or issues please contact Sara Gosline at: sgosline _at_ mit _dot_ edu.
REFERENCES
- Gosline S.J.C., Spencer S.J., Ursu O., Fraenkel E. (2012) SAMNet: a network-based approach to integrate multi-dimensional high throughput datasets, Integrative Biology, 4: 1415-27.
- Data obtained from The Cancer Genome Atlas.
- Lopes C.T., Franz M., Kazi F., Donaldson S.L., Morris Q., Bader G.D. (2010) Cytoscape Web: an interactive web-based network browser, Bioinformatics, 26(18):2347-8
- Jiao X., Sherman B.T., Huang D.W., Stephens R., Baseler M.W., Lane H.C., Lempicki R.A., (2012) DAVID-WS: A Stateful Web Service to Facilitate Gene/Protein List Analysis, Bioinformatics, 28(13):1805-06.