SAMNetWeb Tutorial


SAMNet uses a constrained optimization approach to identify relationships between high-throughput mRNA expression data with other measured changes (i.e. genetic hits, phosphoproteomics). The result is a compact network with edges representing protein interactions that best explain the changes in mRNA expression dowstream of the upstream changes for each condition. DAVID automatically scans the resulting network to identify pathways that are enriched by the joint dataset.


INPUTS

SAMNetWeb is a very flexible platform that can be run in different modes. Minimum input required to run the algorithm includes a set of 'Source Weights' for each experimental condition or 'commodity' in the flow formulation and a set of 'Sink Weights' for each condition. In the EMT example highlighted in [1] and shown here, Sink weights are mRNA differentially expressed upon EMT induction and Source weights are proteins with post-translational modifications in each condition.

However, it is also possible to run SAMNetWeb without mRNA expression data, using protein changes as sink weights (e.g. when source weights are genetic hits). In this case, no TF-DNA interaction network is needed and the "Use sink weight directly" box must be checked.

Specific formatting of each file is described below:

  • Protein-Protein Interactions

  • Sink Weights Input

    • Each mRNA is associated with a commodity and a weight representing the importance of that mRNA in the network. While these values usually represent differential expression, they can also be significance or even correlation with a phenotype of interest. For details see Gosline et al[1].

      If using our provided TF-DNA interactions, genes must be in HUGO gene name format. A sample sink weights file is available here, where the weight is set as the correlation value between the mRNA expression and their associated microRNA (commodity) expression levels across 849 breast cancer patients from[2].

      It is also possible to assign weights directly to proteins without adding TF-DNA interactions . If so, these weights should be annotated with UNIPROT Entry Names and you must check the "Use sink weights directly" checkbox. A template for the input format is available below:

      File template:
        Commodity1    mRNA1    Weight
        Commodity1    mRNA2    Weight
        Commodity2    mRNA1    Weight
        Commodity2    mRNA3    Weight
        ...
        ...

  • Source Weights Input
    • The second input file requires a set of proteins of interest from each experiment ('commodity') be annotated by a weight signifying its importance in the experiment. For details see Gosline et al. Sample protein weights are provided here, where the weight is set as the TargetScan context+ score for the particular protein and its associated microRNA (commodity). SAMNet takes the absolute value of any negative numbers.
      A template for the input format is available below:

      File template:
        Commodity1    Protein1    Weight
        Commodity1    Protein2    Weight
        Commodity2    Protein1    Weight
        Commodity2    Protein3    Weight
        ...
        ...

      Please note: The naming of the protein nodes in the protein weights input file must be consistent with the naming of the protein nodes in the interactome, which is UNIPROT Entry Names for the provided interactomes. If you provide HUGO/HGNC gene names they will be converted automatically. If you upload an interactome with different identifiers, you can convert your lists to standard formats using DAVID, or HUGO.


  • Protein-mRNA Edge Weights
    • If the sink weights represent mRNA expression data, we suggest using a protein-DNA interaction network to connect these relevant changes to the protein-interaction network. As such, we have compiled a set of protein-DNA predicted interactions using TRANSFAC motif scanning of ENCODE provided regions for various cell types. We recommend using one of these provided datasets:

      • ENCODE Dnase I cluster, 2kb window around gene trancsription start site
      • MCF7 DNase I hypersensitive sites, 2kb window around gene transcription start site
      • A549 Dnase I hypersensitive sites, 2kb window around each gene transcription start site


      If you provide your own edge weights file, you can provide a NetworkX graph object in Python Pickle format or use the following format:
      • Tab-delimited text file
      • The first column is the protein (transcription factor) interacting with the mRNA in the second column. The third column is a weight assigned to weigh the protein-mRNA interaction.

      Tab-delimited text file template:
        Protein1    mRNA1    Weight
        Protein1    mRNA2    Weight
        Protein2    mRNA1    Weight
        Protein2    mRNA3    Weight
        ...
        ...

      Please note: The naming of the protein nodes in the edge weights file must be consistent with the naming of the protein nodes in the protein weights input file. You can convert your lists to standard formats using DAVID, or HUGO.


  • Gamma Parameter
    • The SAMNet algorithm has one parameter, gamma, that controls the number of source weights included in the network. The default value of gamma is 14 but we recommend tuning the parameter to get the best distint GO terms in the DAVID analysis performed on the network

  • Hierarchical Capacities
    • The SAMNet algorithm typically sets all edge capacities to 1. However, to make the network more compact we have implemented a hierarchical reduction of capacities of edges that are farther from the source. Specifically, when this flag is set, capacities are 1 x 10e(-1 x shortest dist to source) for each edge in the network.


    OUTPUT DETAILS


    Press 'Submit Job' and your job will be submitted to a queue where it will be run sequentially with other jobs. You will be directed to a unique URL where your output will be posted when it is ready. You may provide an e-mail address if you would like to receive a notification when the results are ready. The state of the queue can be viewed here


    SAMNetWeb runs are proportional to network size and usually run in 5-20 minutes, but can take up to an hour.

    A web link is provided at the time of data submission, which you can bookmark and access at a later time. If you provide an email address, you will be notified when your job is complete.

  • Visualization
    • The result page (sample shown here) includes a basic visualization of the optimal SAMNet interactome using the Cytoscape Web plug-in [3]. This visualization is provided to give users a quick look before they download output files.

  • Downloads

      All output from the SAMNet algorithm can be downloaded ofr further analysis. Tab-delimited files are available to show which inputs were found in the interactome including the Source (S1) and Sink (T1) nodes, that were added to the original network. Cytoscape files (graph structure and node/edge attributes) are available for more detailed analysis:

      • Edge flow values
      • Edge types
      • Node flow values
      • Node types
      Lastly, Significant DAVID enrichment terms are available for download to provide a unified functional enrichment of the commodities across the distinct experimental platforms.


    MORE DETAILS

      More detailed information and descriptions about the SAMNet method and its application in identifying key mediators of human epithelial-mesenchymal are available in ref [1].

    CONTACT

      Any questions or issues please contact Sara Gosline at: sgosline _at_ mit _dot_ edu.


    REFERENCES

    1. Gosline S.J.C., Spencer S.J., Ursu O., Fraenkel E. (2012) SAMNet: a network-based approach to integrate multi-dimensional high throughput datasets, Integrative Biology, 4: 1415-27.
    2. Data obtained from The Cancer Genome Atlas.
    3. Lopes C.T., Franz M., Kazi F., Donaldson S.L., Morris Q., Bader G.D. (2010) Cytoscape Web: an interactive web-based network browser, Bioinformatics, 26(18):2347-8
    4. Jiao X., Sherman B.T., Huang D.W., Stephens R., Baseler M.W., Lane H.C., Lempicki R.A., (2012) DAVID-WS: A Stateful Web Service to Facilitate Gene/Protein List Analysis, Bioinformatics, 28(13):1805-06.

    This website is free and open to all users and no login required.