TopHat (bioinformatics)

TopHat is a bioinformatic sequence analysis package tool for fast and high throughput alignment of shotgun cDNA sequencing reads generated by transcriptomics technologies (e.g. RNA-Seq) using Bowtie first and then mapping to a reference genome to discover RNA splice sites de novo. TopHat aligns RNA-Seq reads to mammalian-sized genomes.^[1]

TopHat is a free, open-source software. It's available at http://tophat.cbcb.umd.edu.^[2]

History

TopHat was originally developed in 2009 by Cole Trapnell, Lior Pachter and Steven Salzberg at the Mathematics Department, UC Berkeley and the Center for Bioinformatics and Computational Biology at the University of Maryland, College Park.^[2] Trapnell later moved to Genome Sciences Department at the University of Washington. TopHat is a collaborative effort between Cole Trapnell at the University of Washington and Daehwan Kim and Steven Salzberg in the Center for Computational Biology at Johns Hopkins University who together in 2013 also came up with TopHat2 which does accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.^[3]

Uses

TopHat is used to align reads from an RNA-Seq experiment. It is a read-mapping algorithm and it aligns the reads to a reference genome. It is useful because it does not need to rely on known splice sites.^[2] TopHat can be used with the Tuxedo pipeline, and is frequently used with Bowtie.

Advantages/Disadvantages

Advantages

When TopHat first came out, it was faster than previous systems. It mapped more than 2.2 million reads per CPU hour. That speed allowed the user to process and entire RNA-Seq experiment in less than a day, even on a standard desktop computer. Trapnell_2009 Tophat uses Bowtie in the beginning to analyze the reads, but then does more to analyze the reads that span exon-exon junctions. If you are using TopHap for RNASeq data, you will get more read aligned against the reference genome.^[4]

A decent advantage for TopHat is that it does not need to rely on known splice sites when aligning reads to a reference genome.^[2]

Disadvantages

TopHat is in a low maintenance, low support stage. It has been superseded by HISAT2, which is more efficient and accurate and provides the same core functionality (spliced alignment of RNA-Seq reads).^[1]

Newer protocols are more efficient now, compared to TopHat such as cufflinks, STAR, and limma.

This is an example of a pipeline for RNA-seq workflow using STAR and Limma. This particular pipeline is more efficient than one using TopHat.

References

1 2 "TopHat". ccb.jhu.edu. Retrieved 2018-04-17.
1 2 3 4 Trapnell C, Pachter L, Salzberg SL (May 2009). "TopHat: discovering splice junctions with RNA-Seq". Bioinformatics. 25 (9): 1105–11. doi:10.1093/bioinformatics/btp120. PMC 2672628. PMID 19289445.
↑ Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (April 2013). "TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions". Genome Biology. 14 (4): R36. doi:10.1186/gb-2013-14-4-r36. PMC 4053844. PMID 23618408.
↑ "Bowtie & Tophat". www.biostars.org. Retrieved 2018-04-24.

External links

TopHat page on Center for Computational Biology at JHU

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[:0-1] 1 2 "TopHat". ccb.jhu.edu. Retrieved 2018-04-17.

[Trapnell_2009-2] 1 2 3 4 Trapnell C, Pachter L, Salzberg SL (May 2009). "TopHat: discovering splice junctions with RNA-Seq". Bioinformatics. 25 (9): 1105–11. doi:10.1093/bioinformatics/btp120. PMC 2672628. PMID 19289445.

[tophat2-3] Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (April 2013). "TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions". Genome Biology. 14 (4): R36. doi:10.1186/gb-2013-14-4-r36. PMC 4053844. PMID 23618408.

[4] "Bowtie & Tophat". www.biostars.org. Retrieved 2018-04-24.

Bioinformatics
Databases	Sequence databases: GenBank, European Nucleotide Archive and DNA Data Bank of Japan Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL and Protein Information Resource Other databases: Protein Data Bank, Ensembl and InterPro Specialised genomic databases: BOLD, Saccharomyces Genome Database, FlyBase, VectorBase, WormBase, PHI-base, Arabidopsis Information Resource and Zebrafish Information Network
Software	BLAST Bowtie Clustal HMMER MUSCLE SAMtools TopHat
Other	Server: ExPASy Ontology: Gene Ontology
Institutions	European Bioinformatics Institute US National Center for Biotechnology Information Swiss Institute of Bioinformatics Japanese Institute of Genetics Philippine Genome Center Broad Institute Wellcome Sanger Institute
Organizations	International Society for Computational Biology (ISCB) European Molecular Biology network (EMBnet) African Society for Bioinformatics and Computational Biology (ASBCB) Japanese Society for Bioinformatics (JSBi)
Meetings	Intelligent Systems for Molecular Biology (ISMB) Research in Computational Molecular Biology (RECOMB) European Conference on Computational Biology (ECCB) Pacific Symposium on Biocomputing (PSB) ISCB Africa ASBCB Conference on Bioinformatics Basel Computational Biology Conference‎ ([BC²]) International Conference on Bioinformatics (InCoB)
Related topics	Computational biology List of biological databases Sequencing Sequence database Sequence alignment Molecular phylogenetics
Category Commons Portal