Pathway analysis

In bioinformatics research, pathway analysis software is used to identify related proteins within a pathway or building pathway de novo from the proteins of interest. This is helpful when studying differential expression of a gene in a disease or analyzing any omics dataset with a large number of proteins. By examining the changes in gene expression in a pathway, its biological causes can be explored. Pathway is the term from molecular biology which depicts an artificial simplified model of a process within a cell or tissue. Typical pathway model starts with extracellular signaling molecule that activates a specific protein. Thus triggers a chain of protein-protein or protein-small molecule interactions.[1] Pathway analysis helps to understand or interpret omics data from the point of view of canonical prior knowledge structured in the form of pathways diagrams. It allows finding distinct cell processes (Cellular processes), diseases or signaling pathways that are statistically associated with selection of differentially expressed genes between two samples.[2] Often but erroneously pathway analysis is used as synonym for network analysis (functional enrichment analysis and gene set analysis).[3]

Uses

The data for pathway analysis come from high throughput biology. This includes high throughput sequencing data and microarray data. Before pathway analysis can be done, the omics data should be normalized, and genes should be ranked by differential expression usually with help of Student's t-test, ANOVA or other statistics. In general, any list of statistical ranked genes can be analyzed by pathway analysis. For example, often the functional activity of proteins can be inferred using network enrichment analysis of genes deferentially expressed in the experiment. Such functional activity scores can then be used for pathway analysis to find pathways responsible for observed differential expression. In case when ranking is not available simply list of genes can be analyzed. Also it is possible to integrate multiple microarray data sets from different research groups by meta-analysis and cross-platform normalization.[4] By using pathway analysis software, researchers can determine which gene groups such as pathways, cell processes or diseases are enriched with over and under expressed in experimental data genes. They can also infer associated upstream and downstream regulators, proteins, small molecules, drugs, etc.[5] For example, pathway analysis of several independent microarray experiments (meta-analysis) helped to discover potential biomarkers in a single pathway important for fast-to-slow switch fiber type transition in Duchenne muscular dystrophy.[6] In other study meta-analysis identified two biomarkers in blood of patients with Parkinson's Disease, which can be useful for monitoring the disease.[7]

Pathways Databases

Pathway analysis needs a knowledge base with pathway collection and interaction networks. Pathway collections content, structure and functionality usually vary in different sources. The examples of the pathway collections are KEGG [8], WikiPathways, and Reactome.[9] Also there are commercial pathways collections such as Pathway Studio pathways [10] and IPA pathways.[11]

Methods and software

Pathway analysis software can be generally divided into web-based applications, desktop programs and programming packages. Programming packages are mostly coded in the R and Python languages, and are shared openly through the BioConductor [12] and GitHub [13] projects. Different methods of pathway analysis evolve fast, so classification of these methods is still discussable.[14][15] There are 3 main groups of methods in pathway analysis according to:[16] ORA, FSC and PT.

Over-Representation Analysis or Enrichment Analysis (ORA)

This method measures the percentage of genes in a pathway or any gene group (gene ontology (GO) groups, protein families, pathways) that have differential expression. The aim of ORA is to get a list of the most relevant pathways, ordered in accordance to a p-value. The basic hypothesis in ORA is that relevant pathways can be identified by the number of genes differently expressed in the experiment that pathways contain. The statistical significance of the overlap between genes from a pathway and the list of differently expressed genes is determined by such statistical tests as Fisher's exact test, hypergeometric distribution test or Jaccard index.

Functional Class Scoring (FCS)

This method analyzes the expression change of overall genes in the list (not ranking by statistical significance or something else) of differently expressed in experiment genes. FCS discards the ORA cut-off threshold limitation. The aim of FCS is to evaluate differently expressed genes enrichment scores (see gene set enrichment) using pathways as gene sets to perform their computations. One of the first and most popular methods deploying the FCS approach is the Gene Set Enrichment Analysis (GSEA).[17]

Pathway Topology (PT)

Pathway topology is essentially the same as FCS, except PT uses gene-level statistics through different databases integration.[18] However the critical difference is that by leveraging the information about role, position, and direction of interaction from the pathway database, PT is able to re-score the significance of a pathway as the linkages change, whereas FCS will always provide the same score.[19] Examples for PT approaches include Signaling Pathway Impact Analysis (SPIA),[20][21] EnrichNet,[22] GGEA,[23] and TopoGSA.[24]

Notable companies

Several companies have licensed software to perform a number of analytic methods on gene set. Most of free software solutions provide only links to online pathway collections; rather commercial ones have their own collections. The choice of best software depends on user skills, cost and time which one could spend on pathways analysis.[25] Ingenuity, for example, charges a fee for use of their software. Some software, like STRING or Cytoscape are an open-source. However, Ingenuity maintains a knowledge base to compare gene expression data to.[26] Pathways Studio [27] is commercial software which allows to search biologically relevant facts, analyze experiments and create pathways. Pathways Studio Viewer [28] is a free resource from that company for making acquaintance with Pathway Studio interactive pathway collection and database. Only two commercial applications are known to offer pathway topology (PT) based analyses, PathwayGuide from Advaita Corporation and MetaCore from Thomson Reuters.[29] Advaita uses the peer reviewed Signaling Pathway Impact Analysis (SPIA) method[30][31] while the MetaCore method is unpublished.[32]

Limits

Missing annotations on cell types and conditions

Many current methods for pathway analysis depend on existing databases. The data used, however, is not always completely annotated. Many genes interactions in databases are relatively speculative as they are based on scientific facts, are pulled from a specific cell type or disease. Also most canonical pathways are built using the knowledge obtained from a limited number of experiments with narrow cell models. Therefore, interpretation of results of pathway analysis of omics data obtained from different tissues should be done with caution.[33]

References

  1. Berg J. M., Tymoczko J. L., Stryer L. Biochemistry, 5th edition, New York: W. H. Freeman; 2002
  2. García-Campos, Miguel Angel; Espinal-Enríquez, Jesús; Hernández-Lemus, Enrique (2015). "Pathway analysis: State of the art". Frontiers in Physiology. 6. doi:10.3389/fphys.2015.00383.
  3. GSEA
  4. Walsh, Christopher; Hu, Pingzhao; Batt, Jane; Santos, Claudia (2015). "Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery". Microarrays. 4 (3): 389–406. doi:10.3390/microarrays4030389.
  5. Subramanian, Aravind; Tamayo, Pablo; Mootha, Vamsi K.; Mukherjee, Sayan; Ebert, Benjamin L.; Gillette, Michael A.; Paulovich, Amanda; et al. (2005). "Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles". Proceedings of the National Academy of Sciences of the United States of America. 102 (43): 15545–50. doi:10.1073/pnas.0506580102. PMC 1239896. PMID 16199517.
  6. Kotelnikova, Ekaterina; Shkrob, Maria A.; Pyatnitskiy, Mikhail A.; Ferlini, Alessandra; Daraselia, Nikolai (2012). "Novel Approach to Meta-Analysis of Microarray Datasets Reveals Muscle Remodeling-Related Drug Targets and Biomarkers in Duchenne Muscular Dystrophy". PLoS Computational Biology. 8 (2): e1002365. doi:10.1371/journal.pcbi.1002365.
  7. Santiago, Jose A.; Potashkin, Judith A. (2015). "Network-Based Metaanalysis Identifies HNF4A and PTBP1 as Longitudinally Dynamic Biomarkers for Parkinson's Disease". Proceedings of the National Academy of Sciences of the United States of America. 112 (7): 2257–62. doi:10.1073/pnas.1423573112. PMC 4343174.
  8. Ogata, H.; Goto, S.; Sato, K.; Fujibuchi, W.; Bono, H.; Kanehisa, M. (1999). "KEGG: Kyoto Encyclopedia of Genes and Genomes". Nucleic Acids Research. 27 (1): 29–34. doi:10.1093/nar/27.1.29. PMC 148090. PMID 9847135.
  9. Vastrik, Imre; D'Eustachio, Peter; Schmidt, Esther; Joshi-Tope, Geeta; Gopinath, Gopal; Croft, David; de Bono, Bernard; et al. (2007). "Reactome: A Knowledge Base of Biologic Pathways and Processes". Genome Biology. 8 (3): R39. doi:10.1186/gb-2007-8-3-r39.
  10. Pathway Studio Pathways
  11. Pathway Central
  12. Gentleman, R. C.; Carey, V. J.; Bates, D. M.; Bolstad, B.; Dettling, M.; Dudoit, S.; et al. (2004). "Bioconductor: open software development for computational biology and bioinformatics". Genome Biol. 5: R80. doi:10.1186/gb-2004-5-10-r80. PMC 545600. PMID 15461798.
  13. Dabbish, L., Stuart, C., Tsay, J., and Herbsleb, J. (2012). "Social coding in github: transparency and collaboration in an open software repository," in Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (New York, NY: ACM), 1277–1286
  14. Khatri P., Sirota M., Butte A. J. Ten years of pathway analysis: current approaches and outstanding challenges. Plos Comput Biol. 2012;8(2)
  15. Henderson-Maclennan NK, Papp JC, Talbot CC, McCabe ERB, Presson AP. Pathway analysis software: annotation errors and solutions. Mol Genet Metab. 2010 Nov;101(2–3):134–40
  16. Khatri P., Sirota M., Butte A. J. Ten years of pathway analysis: current approaches and outstanding challenges. Plos Comput Biol. 2012;8(2)
  17. Subramanian, Aravind; Tamayo, Pablo; Mootha, Vamsi K.; Mukherjee, Sayan; Ebert, Benjamin L.; Gillette, Michael A.; Paulovich, Amanda; et al. (2005). "Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles". Proceedings of the National Academy of Sciences of the United States of America. 102 (43): 15545–50. doi:10.1073/pnas.0506580102. PMC 1239896. PMID 16199517.
  18. Emmert-Streib, F.; Dehmer, M. (2011). "Networks for systems biology: conceptual connection of data and function". Syst. Biol. IET. 5: 185–207. doi:10.1049/iet-syb.2010.0025.
  19. Khatri, Purvesh; Sirota, Marina; Butte, Atul J.; Ouzounis, Christos A. (23 February 2012). "Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges". PLoS Computational Biology. 8 (2): e1002375. doi:10.1371/journal.pcbi.1002375. PMC 3285573. PMID 22383865.
  20. Draghici, S.; Khatri, P.; Tarca, A. L.; Amin, K.; Done, A.; Voichita, C.; Georgescu, C.; Romero, R. (4 September 2007). "A systems biology approach for pathway level analysis". Genome Research. 17 (10): 1537–1545. doi:10.1101/gr.6202607. PMC 1987343.
  21. Tarca, A. L.; Draghici, S.; Khatri, P.; Hassan, S. S.; Mittal, P.; Kim, J.-s.; Kim, C. J.; Kusanovic, J. P.; Romero, R. (5 November 2008). "A novel signaling pathway impact analysis". Bioinformatics. 25 (1): 75–82. doi:10.1093/bioinformatics/btn577. PMC 2732297.
  22. Glaab, E.; Baudot, A.; Krasnogor, N.; Schneider, R. S.; Valencia, A. (15 September 2012). "EnrichNet: Network-based gene set enrichment analysis". Bioinformatics. 28 (18): i451–i457. doi:10.1093/bioinformatics/bts389. PMC 3436816. PMID 22962466.
  23. Geistlinger, L.; Csaba, G.; Küffner, R.; Mulder, N.; Zimmer, R. (2011). "From sets to graphs: Towards a realistic enrichment analysis of transcriptomic systems". Bioinformatics. 27: 1–9. doi:10.1093/bioinformatics/btr228. PMC 3117393. PMID 21685094.
  24. Glaab, E.; Baudot, A.; Krasnogor, N.; Valencia, A. (2012). "TopoGSA: Network topological gene set analysis". Bioinformatics. 26 (18): 1271–1272. doi:10.1093/bioinformatics/btq131. PMC 2859135. PMID 20335277.
  25. García-Campos, Miguel Angel; Espinal-Enríquez, Jesús; Hernández-Lemus, Enrique (2015). "Pathway analysis: State of the art". Frontiers in Physiology. 6. doi:10.3389/fphys.2015.00383.
  26. "Ingenuity IPA - Integrate and Understand Complex 'omics Data." Ingenuity. Web. 8 Apr. 2015. <http://www.ingenuity.com/products/ipa#/?tab=features>.
  27. Pathway Studio
  28. Pathway Studio Viewer
  29. Mitrea, Cristina; Taghavi, Zeinab; Bokanizad, Behzad; Hanoudi, Samer; Tagett, Rebecca; Donato, Michele; Voichiţa, Călin; Drăghici, Sorin (2013). "Methods and approaches in the topology-based analysis of biological pathways". Frontiers in Physiology. 4. doi:10.3389/fphys.2013.00278.
  30. Draghici, S.; Khatri, P.; Tarca, A. L.; Amin, K.; Done, A.; Voichita, C.; Georgescu, C.; Romero, R. (4 September 2007). "A systems biology approach for pathway level analysis". Genome Research. 17 (10): 1537–1545. doi:10.1101/gr.6202607. PMC 1987343.
  31. Tarca, A. L.; Draghici, S.; Khatri, P.; Hassan, S. S.; Mittal, P.; Kim, J.-s.; Kim, C. J.; Kusanovic, J. P.; Romero, R. (5 November 2008). "A novel signaling pathway impact analysis". Bioinformatics. 25 (1): 75–82. doi:10.1093/bioinformatics/btn577. PMC 2732297.
  32. Mitrea, Cristina; Taghavi, Zeinab; Bokanizad, Behzad; Hanoudi, Samer; Tagett, Rebecca; Donato, Michele; Voichiţa, Călin; Drăghici, Sorin (2013). "Methods and approaches in the topology-based analysis of biological pathways". Frontiers in Physiology. 4. doi:10.3389/fphys.2013.00278.
  33. Henderson-Maclennan, Nicole K., Jeanette C. Papp, C. Conover Talbot, Edward R. B. McCabe, and Angela P. Presson. "Pathway Analysis Software: Annotation Errors and Solutions."Molecular Genetics and Metabolism (2010): 134–40. PMC. Web. 8 April 2015.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.