GeneNetwork

GeneNetwork is a combined database and open-source bioinformatics data analysis software resource for systems genetics.[1] This resource is used to study gene regulatory networks that link DNA sequence variants to corresponding differences in gene and protein expression and to differences in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes (e.g., SNPs) and phenotypes that are obtained from groups of related individuals, including human families, experimental crosses of strains of mice and rats, and organisms as diverse as Drosophila melanogaster, Arabidopsis thaliana, and barley.[2] The inclusion of genotypes for all individuals makes it practical to carry out web-based gene mapping to discover those regions of the genome that contribute to differences in gene expression, cell function, anatomy, physiology, and behavior among individuals.

GeneNetwork
Developer(s)GeneNetwork Development Team, University of Tennessee
Stable release
2.0 / 2 June 2017 (2017-06-02)
Repositorygithub.com/genenetwork/genenetwork2
Written inJavaScript, HTML, Python, CSS, CoffeeScript, PHP
LicenseAffero General Public License
Websitegn2.genenetwork.org

History

Development of GeneNetwork started at the University of Tennessee Health Science Center, Memphis USA in 2000-2001 as a web-based extension of the Portable Dictionary of the Mouse Genome (1994) [3] and Kenneth F. Manly's Map Manager QT. It was initially named WebQTL.[4] Gene mapping data sets were incorporated for several mouse recombinant inbred strains. By early 2003, the first large Affymetrix gene expression data sets (whole mouse brain mRNA and hematopoietic stem cells) were incorporated and the system was renamed.[5][6] GeneNetwork is now developed by an international group of developers and has mirror and development sites in Europe, Asia, and Australia. Production services are hosted on systems at University of Tennessee Health Science Center and on the Amazon Elastic Compute Cloud.

A new version of GeneNetwork (version 2, GN2) was released in 2016 at GeneNetwork 2.[7] This version uses the same database as GN1, but much more modular and maintainable open source code (available on GitHub). GN2 also has significant new features including support for:

Organization and use

GeneNetwork consists of two major components:

  • Massive collections of genetic, genomic, and phenotype data for large families
  • Sophisticated statistical analysis and gene mapping software that enable analysis of regulatory networks and genotype-to-phenotype relations

Four levels of data are usually obtained for each family or population:

  1. DNA sequences and genotypes
  2. Molecular expression data often generated using arrays, RNA-seq, proteomic, metabolomic and metagenomic methods (molecular phenotypes)
  3. Standard phenotypes of the type that are part of a typical medical record (e.g., blood chemistry, body weight)
  4. Annotation files and metadata

The combined data types are housed together in a relational database and fileservers, and are conceptually organized and grouped by species, cohort, and family. The system is implemented as a LAMP (software bundle) stack. Code and a simplified version of the MySQL database are available on SourceForge and GitHub.

GeneNetwork is primarily used by researchers, but has also been adopted successfully for undergraduate and graduate courses in genetics and bioinformatics (see YouTube example), bioinformatics, physiology, and psychology.[10] Researchers and students typically retrieve sets of genotypes and phenotypes from one or more families and use built-in statistical and mapping functions to explore relations among variables and to assemble networks of associations. Key steps include the analysis of these factors:

  1. The range of variation of traits
  2. Covariation among traits (scatterplots and correlations, principal component analysis)
  3. Architecture of larger networks of traits
  4. Quantitative trait locus mapping and causal models of the linkage between sequence differences and phenotype differences

Data sources

Traits and molecular expression data sets are submitted by researchers directly or are extracted from repositories such as National Center for Biotechnology Information Gene Expression Omnibus. Data cover a variety of cells and tissues—from single cell populations of the immune system, specific tissues (retina, prefrontal cortex), to entire systems (whole brain, lung, muscle, heart, fat, kidney, flower, whole plant embryos). A typical data set covers hundreds of fully genotyped individuals and may also include technical and biological replicates. Genotypes and phenotypes are usually taken from peer-reviewed papers. GeneNetwork includes annotation files for several RNA profiling platforms (Affymetrix, Illumina, and Agilent). RNA-seq and quantitative proteomic, metabolomic, epigenetics, and metagenomic data are also available for several species, including mouse and human.

Tools and features

There are tools on the site for a wide range of functions that range from simple graphical displays of variation in gene expression or other phenotypes, scatter plots of pairs of traits (Pearson or rank order), construction of both simple and complex network graphs, analysis of principal components and synthetic traits, QTL mapping using marker regression, interval mapping, and pair scans for epistatic interactions. Most functions work with up to 100 traits and several functions work with an entire transcriptome.

The database can be browsed and searched at the main search page. An on-line tutorial is available. Users can also download the primary data sets as text files, Excel, or in the case of network graphs, as SBML. As of 2017, GN2 is available as a beta release.

Code

GeneNetwork is an open source project released under the Affero General Public License (AGPLv3). The majority of code is written in Python, but includes modules and other code written in C, R, and JavaScript. The code is mainly Python 2.4. GN2 is mainly written in Python 2.7 in a Flask framework with Jinja2 HTML templates) but with conversion to Python 3.X planned over the next few years. GN2 calls many statistical procedures written in the R programming language. The original source code from 2010 along with a compact database are available on SourceForge. While GN1 is still actively maintened on GitHub code, as of 2017 work is focused on GN2.

See also

References

  1. Morahan, G; Williams, RW (2007). "Systems genetics: the next generation in genetics research?". Novartis Foundation symposium. 281: 181–8, discussion 188–91, 208–9. doi:10.1002/9780470062128.ch15. PMID 17534074.
  2. Druka, A; Druka, I; Centeno, AG; Li, H; Sun, Z; Thomas, WT; Bonar, N; Steffenson, BJ; Ullrich, SE; Kleinhofs, Andris; Wise, Roger P; Close, Timothy J; Potokina, Elena; Luo, Zewei; Wagner, Carola; Schweizer, Günther F; Marshall, David F; Kearsey, Michael J; Williams, Robert W; Waugh, Robbie (2008). "Towards systems genetic analyses in barley: Integration of phenotypic, expression and genotype data into GeneNetwork". BMC Genetics. 9: 73. doi:10.1186/1471-2156-9-73. PMC 2630324. PMID 19017390.
  3. Williams, RW (1994). "The Portable Dictionary of the Mouse Genome: a personal database for gene mapping and molecular biology". Mammalian Genome. 5 (6): 372–5. doi:10.1007/bf00356557. PMID 8043953.
  4. Chesler, EJ; Lu, L; Wang, J; Williams, RW; Manly, KF (2004). "WebQTL: rapid exploratory analysis of gene expression and genetic networks for brain and behavior". Nature Neuroscience. 7 (5): 485–6. doi:10.1038/nn0504-485. PMID 15114364.
  5. Chesler, EJ; Lu, L; Shou, S; Qu, Y; Gu, J; Wang, J; Hsu, HC; Mountz, JD; et al. (2005). "Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function". Nature Genetics. 37 (3): 233–42. doi:10.1038/ng1518. PMID 15711545.
  6. Bystrykh, L; Weersing, E; Dontje, B; Sutton, S; Pletcher, MT; Wiltshire, T; Su, AI; Vellenga, E; et al. (2005). "Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics'". Nature Genetics. 37 (3): 225–32. doi:10.1038/ng1497. PMID 15711547.
  7. Sloan, Z (2016). "GeneNetwork: framework for web-based genetics". The Journal of Open Source Software. 1 (2): 25. doi:10.21105/joss.00025.
  8. Zhou, X (2014). "Efficient multivariate linear mixed model algorithms for genome-wide association studies". Nature Methods. 11 (2): 407–9. doi:10.1038/nmeth.2848. PMC 4211878. PMID 24531419.
  9. Arends, D (2016). "Correlation Trait Loci (CTL) mapping: phenotype network inference subject to genotype". The Journal of Open Source Software. 1 (6): 87. doi:10.21105/joss.00087.
  10. Grisham, W; Schottler, NA; Valli-Marill, J; Beck, L; Beatty, J (2010). "Teaching bioinformatics and neuroinformatics by using free web-based tools". CBE: Life Sciences Education. 9 (2): 98–107. doi:10.1187/cbe.09-11-0079. PMC 2879386. PMID 20516355.
Related resources

Other systems genetics and network databases

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.