List of biological databases

Biological databases are stores of biological information.[1] The journal Nucleic Acids Research regularly publishes special issues on biological databases and has a list of such databases. The 2018 issue has a list of about 180 such databases and updates to previously described databases.[2]

Meta databases

Meta databases are databases of databases that collect data about data to generate new data. They are capable of merging information from different sources and making it available in a new and more convenient form, or with an emphasis on a particular disease or organism.

ConsensusPathDB: a molecular functional interaction database, integrating information from 12 other
Entrez (National Center for Biotechnology Information)
Neuroscience Information Framework (University of California, San Diego): integrates hundreds of neuroscience relevant resources; many are listed below

Model organism databases

Model organism databases provide in-depth biological data for intensively studied.

PomBase: the knowledgebase for the fission yeast Schizosaccharomyces pombe[3]

Nucleic acid databases

DNA databases

Primary databases
International Nucleotide Sequence Database (INSD) consists of the following databases.

DDBJ (Japan), GenBank (USA) and European Nucleotide Archive (Europe) are repositories for nucleotide sequence data from all organisms. All three accept nucleotide sequence submissions, and then exchange new and updated data on a daily basis to achieve optimal synchronisation between them. These three databases are primary databases, as they house original sequence data. They collaborate with Sequence Read Archive (SRA), which archives raw reads from high-throughput sequencing instruments.

Secondary databases

23andMe's database
HapMap
OMIM (Online Mendelian Inheritance in Man): inherited diseases
RefSeq
1000 Genomes Project: launched in January 2008. The genomes of more than a thousand anonymous participants from a number of different ethnic groups were analyzed and made publicly available.
EggNOG Database: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. It provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation.[4][5]

Gene expression databases (mostly microarray data)

Genome databases

These databases collect genome sequences, annotate and analyze them, and provide public access. Some add curation of experimental literature to improve computed annotations. These databases may hold many species genomes, or a single model organism genome.

ArrayExpress:[6] archive of functional genomics data; stores data from high-throughput functional genomics experiments from EMBL
Bioinformatic Harvester
Ensembl: provides automatic annotation databases for human, mouse, other vertebrate and eukaryote genomes
Ensembl Genomes: provides genome-scale data for bacteria, protists, fungi, plants and invertebrate metazoa, through a unified set of interactive and programmatic interfaces (using the Ensembl software platform)
FlyBase: genome of the model organism Drosophila melanogaster
Gene Disease Database
Gene Expression Omnibus (GEO[7]): a public functional genomics data repository from the U.S. National Cancer Institute (NCI), which supports array- and sequence-based data. Tools for querying and downloading gene expression profiles are provided.
Human Protein Atlas (HPA[8]): a public database with expression profiles of human protein coding genes both on mRNA and protein level in tissues, cells, subcellular compartments, and cancer tumors.
Legume Information System (LIS): genomic database for the legume family[9]
Personal Genome Project: human genomes of 100,000 volunteers from around the world
RGD (Rat Genome Database): genomic and phenotype data for Rattus norvegicus
Saccharomyces Genome Database:[10] genome of the yeast model organism
SNPedia
SoyBase Database[11] (SoyBase): USDA soybean genetics and genomic database (Soybean)
UCSC Malaria Genome Browser: genome of malaria causing species (Plasmodium falciparum and others)
Wormbase: genome of the model organism Caenorhabditis elegans and WormBase ParaSite for parasitic species
Xenbase: genome of the model organism Xenopus tropicalis and Xenopus laevis
Zebrafish Information Network: genome of this fish model organism

Phenotype databases

PHI-base: pathogen-host interaction database. It links gene information to phenotypic information from microbial pathogens on their hosts. Information is manually curated from peer reviewed literature.
RGD Rat Genome Database: genomic and phenotype data for Rattus norvegicus
PomBase database: manually curated phenotypic data for the yeast Schizosaccharomyces pombe

RNA databases

miRBase: the microRNA database
Rfam: a database of RNA families

Amino acid / protein databases

Protein sequence databases

Database of Interacting Proteins (Univ. of California)
DisProt: database of experimental evidences of disorder in proteins (Indiana University School of Medicine, Temple University, University of Padua)
InterPro: classifies proteins into families and predicts the presence of domains and sites
MobiDB: database of intrinsic protein disorder annotation (University of Padua)
neXtProt: a human protein-centric knowledge resource
Pfam: protein families database of alignments and HMMs (Sanger Institute)
PRINTS: a compendium of protein fingerprints from (Manchester University)
PROSITE: database of protein families and domains
Protein Information Resource (Georgetown University Medical Center [GUMC])
SUPERFAMILY: library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms
Swiss-Prot: protein knowledgebase (Swiss Institute of Bioinformatics)
NCBI: protein sequence and knowledgebase (National Center for Biotechnology Information)

Protein structure databases

Protein Data Bank (PDB), comprising:
- Protein DataBank in Europe (PDBe)[12]
- ProteinDatabank in Japan (PDBj)[13]
- Research Collaboratory for Structural Bioinformatics (RCSB)[14]
Structural Classification of Proteins (SCOP)

For more protein structure databases, see also Protein structure database.

Protein model databases

ModBase: database of comparative protein structure models (Sali Lab, UCSF)
Similarity Matrix of Proteins (SIMAP): database of protein similarities computed using FASTA
Swiss-model: server and repository for protein structure models
AAindex: database of amino acid indices, amino acid mutation matrices, and pair-wise contact potentials

Protein-protein and other molecular interactions

BioGRID: general repository for interaction datasets (Samuel Lunenfeld Research Institute)
RNA-binding protein database

Protein expression databases

Human Protein Atlas: aims at mapping all the human proteins in cells, tissues and organs

Signal transduction pathway databases

NCI-Nature Pathway Interaction Database
Netpath: curated resource of signal transduction pathways in humans
Reactome: navigable map of human biological pathways, ranging from metabolic processes to hormonal signalling (Ontario Institute for Cancer Research, European Bioinformatics Institute, NYU Langone Medical Center, Cold Spring Harbor Laboratory)
WikiPathways

Metabolic pathway and protein function databases

BioCyc Database Collection: includes EcoCyc and MetaCyc
BRENDA: the comprehensive enzyme information system, including FRENDA, AMENDA, DRENDA, and KENDA
HMDB: contains detailed information about small molecule metabolites found in the human body
KEGG PATHWAY Database (Univ. of Kyoto)
MANET database (University of Illinois)
Reactome: navigable map of human biological pathways, ranging from metabolic processes to hormonal signalling (Ontario Institute for Cancer Research, European Bioinformatics Institute, NYU Langone Medical Center, Cold Spring Harbor Laboratory)
SABIO-RK: database for biochemical reactions and their kinetic properties
WikiPathways

Additional databases

Exosomal databases

ExoCarta
Extracellular RNA Atlas: a repository of small RNA-seq and qPCR-derived exRNA profiles from human and mouse biofluids

Mathematical model databases

Biomodels Database: published mathematical models describing biological processes

Taxonomic databases

BacDive: bacterial metadatabase that provides strain-linked information about bacterial and archaeal biodiversity, including taxonomy information
EzTaxon-e: database for the identification of prokaryotes based on 16S ribosomal RNA gene sequences

Radiologic databases

Antimicrobial resistance databases

AMRFinderPlus
Antimicrobial Drug Database (AMDD)
ARDB
ARGminer
BacMet
Beta-Lactamase Database (BLAD)
Beta-Lactamase Database (BLDB)
CBMAR
The Comprehensive Antibiotic Resistance Database
EARS-Net
FARME
INTEGRALL
LacED
MEGARes
MUBII-TB-DB
Mustard Database
MvirDB
PathoPhenoDB
PATRIC database
RAC: Repository of Antibiotic resistance Cassettes
ResFinder
TBDReaMDB
u-CARE
VFDB

Wiki-style databases

Specialized databases

Barcode of Life Data Systems: database of DNA barcodes
The Cancer Genome Atlas (TCGA): provides data from hundreds of cancer samples obtained using high-throughput techniques such as gene expression profiling, copy number variation profiling, SNP genotyping, genome-wide DNA methylation profiling, microRNA profiling, and exon sequencing of at least 1,200 genes
Cellosaurus: a knowledge resource on cell lines
CTD (Comparative Toxicogenomics Database): describes chemical-gene-disease interactions
DiProDB: a database to collect and analyse thermodynamic, structural and other dinucleotide properties
Dryad: repository of data underlying scientific publications in the basic and applied biosciences
Edinburgh Mouse Atlas
EPD Eukaryotic Promoter Database
FINDbase (the Frequency of INherited Disorders database)
GigaDB: repository of large scale datasets underlying scientific publications in the biological and biomedical research
HGNC (HUGO Gene Nomenclature Committee): a resource for approved human gene nomenclature
International Human Epigenome Consortium:[15] integrates epigenomic reference data from well-known national endeavors such as the Canadian CEEHRC,[16] European Blueprint,[17] European Genome-phenome Archive (EGA[18]), US ENCODE and NIH Roadmap, German DEEP,[19] Japanese CREST,[20] Korean KNIH, Singapore's GIS and China's EpiHK[21]
MethBase: database of DNA methylation data visualized on the UCSC Genome Browser
Minimotif Miner: database of short contiguous functional peptide motifs
Oncogenomic databases: a compilation of databases that serve for cancer research
PubMed: references and abstracts on life sciences and biomedical topics
RIKEN integrated database of mammals
TDR Targets: a chemogenomics database focused on drug discovery in tropical diseases
TRANSFAC: a database about eukaryotic transcription factors, their genomic binding sites and DNA-binding profiles
JASPAR: a database of manually curated, non-redundant transcription factor binding profiles.
MetOSite: a database about methionine sulfoxidation sites and its functional roles in proteins[22]
Healthcare Cost and Utilization Project (HCUP) is the largest collection of hospital care data in the United States. It includes hundreds of millions of inpatient, outpatient, and emergency records.

References

Wren JD, Bateman A (October 2008). "Databases, data tombs and dust in the wind". Bioinformatics. 24 (19): 2127–8. doi:10.1093/bioinformatics/btn464. PMID 18819940.
"Volume 46 Issue D1 | Nucleic Acids Research | Oxford Academic". academic.oup.com. Retrieved 2018-09-04.
Lock, A; Rutherford, K; Harris, MA; Hayles, J; Oliver, SG; Bähler, J; Wood, V (13 October 2018). "PomBase 2018: user-driven reimplementation of the fission yeast database provides rapid and intuitive access to diverse, interconnected information". Nucleic Acids Research. 47 (D1): D821–D827. doi:10.1093/nar/gky961. PMC 6324063. PMID 30321395.
Powell, Sean; Forslund, Kristoffer; Szklarczyk, Damian; Trachana, Kalliopi; Roth, Alexander; Huerta-Cepas, Jaime; Gabaldón, Toni; Rattei, Thomas; Creevey, Chris; Kuhn, Michael; Jensen, Lars J. (2013-12-01). "eggNOG v4.0: nested orthology inference across 3686 organisms". Nucleic Acids Research. 42 (D1): D231–D239. doi:10.1093/nar/gkt1253. ISSN 0305-1048.
Huerta-Cepas, Jaime; Szklarczyk, Damian; Heller, Davide; Hernández-Plaza, Ana; Forslund, Sofia K; Cook, Helen; Mende, Daniel R; Letunic, Ivica; Rattei, Thomas; Jensen, Lars J; von Mering, Christian (2018-11-12). "eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses". Nucleic Acids Research. 47 (D1): D309–D314. doi:10.1093/nar/gky1085. ISSN 0305-1048.
ArrayExpress
GEO
"The Human Protein Atlas". www.proteinatlas.org. Retrieved 2019-05-27.
Dash S, Campbell JD, Cannon EK, Cleary AM, Huang W, Kalberer SR, Karingula V, Rice AG, Singh J, Umale PE, Weeks NT, Wilkey AP, Farmer AD, Cannon SB (January 2016). "Legume information system (LegumeInfo.org): a key component of a set of federated data resources for the legume family". Nucleic Acids Research. 44 (D1): D1181-8. doi:10.1093/nar/gkv1159. PMC 4702835. PMID 26546515.
"Saccharomyces Genome Database | SGD". www.yeastgenome.org. Retrieved 2018-09-04.
Grant, David; Nelson, Rex T.; Cannon, Steven B.; Shoemaker, Randy C. (2010). "SoyBase, the USDA-ARS soybean genetics and genomics database". Nucleic Acids Research. 38 (Suppl 1) (Database issue): D843–D846. doi:10.1093/nar/gkp798. PMC 2808871. PMID 20008513.
Mir S, Alhroub Y, Anyango S, Armstrong DR, Berrisford JM, Clark AR, Conroy MJ, Dana JM, Deshpande M, Gupta D, Gutmanas A, Haslam P, Mak L, Mukhopadhyay A, Nadzirin N, Paysan-Lafosse T, Sehnal D, Sen S, Smart OS, Varadi M, Kleywegt GJ, Velankar S (January 2018). "PDBe: towards reusable data delivery infrastructure at protein data bank in Europe". Nucleic Acids Research. 46 (D1): D486–D492. doi:10.1093/nar/gkx1070. PMC 5753225. PMID 29126160.
Kinjo AR, Bekker GJ, Suzuki H, Tsuchiya Y, Kawabata T, Ikegawa Y, Nakamura H (January 2017). "Protein Data Bank Japan (PDBj): updated user interfaces, resource description framework, analysis tools for large structures". Nucleic Acids Research. 45 (D1): D282–D288. doi:10.1093/nar/gkw962. PMC 5210648. PMID 27789697.
Rose PW, Prlić A, Altunkaya A, Bi C, Bradley AR, Christie CH, et al. (January 2017). "The RCSB protein data bank: integrative view of protein, gene and 3D structural information". Nucleic Acids Research. 45 (D1): D271–D281. doi:10.1093/nar/gkw1000. PMC 5210513. PMID 27794042.
(IHEC) data portal
CEEHRC
Blueprint
EGA
DEEP
CREST
"Sharing epigenomes globally". Nature Methods. 15 (3): 151. 2018. doi:10.1038/nmeth.4630. ISSN 1548-7105.
Valverde, Héctor; Cantón, Francisco R.; Aledo, Juan Carlos (2019). "MetOSite: an integrated resource for the study of methionine residues sulfoxidation". Bioinformatics. 35 (22): 4849–4850. doi:10.1093/bioinformatics/btz462. PMC 6853639. PMID 31197322.

External links

Nucleic Acid Research Molecular Biology Database Collection – over 1,600 databases

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[pmid18819940-1] Wren JD, Bateman A (October 2008). "Databases, data tombs and dust in the wind". Bioinformatics. 24 (19): 2127–8. doi:10.1093/bioinformatics/btn464. PMID 18819940.

[2] "Volume 46 Issue D1 | Nucleic Acids Research | Oxford Academic". academic.oup.com. Retrieved 2018-09-04.

[3] Lock, A; Rutherford, K; Harris, MA; Hayles, J; Oliver, SG; Bähler, J; Wood, V (13 October 2018). "PomBase 2018: user-driven reimplementation of the fission yeast database provides rapid and intuitive access to diverse, interconnected information". Nucleic Acids Research. 47 (D1): D821–D827. doi:10.1093/nar/gky961. PMC 6324063. PMID 30321395.

[4] Powell, Sean; Forslund, Kristoffer; Szklarczyk, Damian; Trachana, Kalliopi; Roth, Alexander; Huerta-Cepas, Jaime; Gabaldón, Toni; Rattei, Thomas; Creevey, Chris; Kuhn, Michael; Jensen, Lars J. (2013-12-01). "eggNOG v4.0: nested orthology inference across 3686 organisms". Nucleic Acids Research. 42 (D1): D231–D239. doi:10.1093/nar/gkt1253. ISSN 0305-1048.

[5] Huerta-Cepas, Jaime; Szklarczyk, Damian; Heller, Davide; Hernández-Plaza, Ana; Forslund, Sofia K; Cook, Helen; Mende, Daniel R; Letunic, Ivica; Rattei, Thomas; Jensen, Lars J; von Mering, Christian (2018-11-12). "eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses". Nucleic Acids Research. 47 (D1): D309–D314. doi:10.1093/nar/gky1085. ISSN 0305-1048.

[6] ArrayExpress

[7] GEO

[8] "The Human Protein Atlas". www.proteinatlas.org. Retrieved 2019-05-27.

[:0-9] Dash S, Campbell JD, Cannon EK, Cleary AM, Huang W, Kalberer SR, Karingula V, Rice AG, Singh J, Umale PE, Weeks NT, Wilkey AP, Farmer AD, Cannon SB (January 2016). "Legume information system (LegumeInfo.org): a key component of a set of federated data resources for the legume family". Nucleic Acids Research. 44 (D1): D1181-8. doi:10.1093/nar/gkv1159. PMC 4702835. PMID 26546515.

[10] "Saccharomyces Genome Database | SGD". www.yeastgenome.org. Retrieved 2018-09-04.

[11] Grant, David; Nelson, Rex T.; Cannon, Steven B.; Shoemaker, Randy C. (2010). "SoyBase, the USDA-ARS soybean genetics and genomics database". Nucleic Acids Research. 38 (Suppl 1) (Database issue): D843–D846. doi:10.1093/nar/gkp798. PMC 2808871. PMID 20008513.

[12] Mir S, Alhroub Y, Anyango S, Armstrong DR, Berrisford JM, Clark AR, Conroy MJ, Dana JM, Deshpande M, Gupta D, Gutmanas A, Haslam P, Mak L, Mukhopadhyay A, Nadzirin N, Paysan-Lafosse T, Sehnal D, Sen S, Smart OS, Varadi M, Kleywegt GJ, Velankar S (January 2018). "PDBe: towards reusable data delivery infrastructure at protein data bank in Europe". Nucleic Acids Research. 46 (D1): D486–D492. doi:10.1093/nar/gkx1070. PMC 5753225. PMID 29126160.

[13] Kinjo AR, Bekker GJ, Suzuki H, Tsuchiya Y, Kawabata T, Ikegawa Y, Nakamura H (January 2017). "Protein Data Bank Japan (PDBj): updated user interfaces, resource description framework, analysis tools for large structures". Nucleic Acids Research. 45 (D1): D282–D288. doi:10.1093/nar/gkw962. PMC 5210648. PMID 27789697.

[14] Rose PW, Prlić A, Altunkaya A, Bi C, Bradley AR, Christie CH, et al. (January 2017). "The RCSB protein data bank: integrative view of protein, gene and 3D structural information". Nucleic Acids Research. 45 (D1): D271–D281. doi:10.1093/nar/gkw1000. PMC 5210513. PMID 27794042.

[15] (IHEC) data portal

[16] CEEHRC

[17] Blueprint

[18] EGA

[19] DEEP

[20] CREST

[21] "Sharing epigenomes globally". Nature Methods. 15 (3): 151. 2018. doi:10.1038/nmeth.4630. ISSN 1548-7105.

[22] Valverde, Héctor; Cantón, Francisco R.; Aledo, Juan Carlos (2019). "MetOSite: an integrated resource for the study of methionine residues sulfoxidation". Bioinformatics. 35 (22): 4849–4850. doi:10.1093/bioinformatics/btz462. PMC 6853639. PMID 31197322.

Bioinformatics
Databases	Sequence databases: GenBank, European Nucleotide Archive and DNA Data Bank of Japan Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL and Protein Information Resource Other databases: Protein Data Bank, Ensembl and InterPro Specialised genomic databases: BOLD, Saccharomyces Genome Database, FlyBase, VectorBase, WormBase, PHI-base, Arabidopsis Information Resource and Zebrafish Information Network
Software	BLAST Bowtie Clustal EMBOSS HMMER MUSCLE SAMtools TopHat
Other	Server: ExPASy Ontology: Gene Ontology Rosalind (education platform)
Institutions	Broad Institute Computational Biology Department (CBD) Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI) Database Center for Life Science (DBCLS) DNA Data Bank of Japan (DDBJ) European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory (EMBL) Flatiron Institute J. Craig Venter Institute (JCVI) Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) US National Center for Biotechnology Information (NCBI) Japanese Institute of Genetics Netherlands Bioinformatics Centre (NBIC) Philippine Genome Center (PGC) Scripps Research Swiss Institute of Bioinformatics (SIB) Wellcome Sanger Institute Whitehead Institute
Organizations	African Society for Bioinformatics and Computational Biology (ASBCB) Australia Bioinformatics Resource (EMBL-AR) European Molecular Biology network (EMBnet) International Nucleotide Sequence Database Collaboration (INSDC) International Society for Biocuration (ISB) International Society for Computational Biology (ISCB) Student Council (ISCB-SC) Institute of Genomics and Integrative Biology (CSIR-IGIB) Japanese Society for Bioinformatics (JSBi)
Meetings	Basel Computational Biology Conference‎ ([BC²]) European Conference on Computational Biology (ECCB) Intelligent Systems for Molecular Biology (ISMB) International Conference on Bioinformatics (InCoB) ISCB Africa ASBCB Conference on Bioinformatics Pacific Symposium on Biocomputing (PSB) Research in Computational Molecular Biology (RECOMB)
File formats	CRAM format FASTA format FASTQ format NeXML format Nexus format Pileup format SAM format Stockholm format
Related topics	Computational biology List of biological databases Molecular phylogenetics Sequencing Sequence database Sequence alignment
Category Commons