TopHat

Authors:
Cole Trapnell;Lior Pachter;Steven L. Salzberg
Affiliations:
-;-;-
Venue:
Bioinformatics
Year:
2009

Citing 0
Cited 22

SpliceIT: A hybrid method for splice signal identification based on probabilistic and biological inference

Journal of Biomedical Informatics
Estimation of alternative splicing isoform frequencies from RNA-Seq data

WABI'10 Proceedings of the 10th international conference on Algorithms in bioinformatics
Isolasso: a lasso regression approach to RNA-seq based transcriptome assembly

RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
T-IDBA: a de novo iterative de bruijn graph assembler for transcriptome

RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
Rapid parallel genome indexing with MapReduce

Proceedings of the second international workshop on MapReduce and its applications
Optimizing bioinformatics workflows for data analysis using cloud management techniques

Proceedings of the 6th workshop on Workflows in support of large-scale science
Inference of isoforms from short sequence reads

RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Unified view of backward backtracking in short read mapping

Algorithms and Applications
TrueSight: self-training algorithm for splice junction detection using RNA-seq

RECOMB'12 Proceedings of the 16th Annual international conference on Research in Computational Molecular Biology
A Cloud Infrastructure for Optimization of a Massive Parallel Sequencing Workflow

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
POPE: pipeline of parentally-biased expression

ISBRA'12 Proceedings of the 8th international conference on Bioinformatics Research and Applications
An integer programming approach to novel transcript reconstruction from paired-end RNA-Seq reads

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Adaptive resource configuration for Cloud infrastructure management

Future Generation Computer Systems
CLIIQ: accurate comparative detection and quantification of expressed isoforms in a population

WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Comparing DNA sequence collections by direct comparison of compressed text indexes

WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
A dynamic pipeline for RNA sequencing on multicore processors

Proceedings of the 20th European MPI Users' Group Meeting
SpliceGrapherXT: From Splice Graphs to Transcripts Using RNA-Seq

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Transforming Genomes Using MOD Files with Applications

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Read Annotation Pipeline for High-Throughput Sequencing Data

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Systematic Assessment of RNA-Seq Quantification Tools Using Simulated Sequence Data

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Managing and Optimizing Bioinformatics Workflows for Data Analysis in Clouds

Journal of Grid Computing
Genome-Guided Transcriptome Assembly in the Age of Next-Generation Sequencing

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu Contact: cole@cs.umd.edu Supplementary information:Supplementary data are available at Bioinformatics online.