Engineering a software tool for gene structure prediction in higher organisms
Information and Software Technology
FANGS: high speed sequence mapping for next generation sequencers
Proceedings of the 2010 ACM Symposium on Applied Computing
Approaching process mining with sequence clustering: experiments and findings
BPM'07 Proceedings of the 5th international conference on Business process management
Minimum factorization agreement of spliced ESTs
WABI'09 Proceedings of the 9th international conference on Algorithms in bioinformatics
Identification of true EST alignments for recognising transcribed regions
International Journal of Data Mining and Bioinformatics
Comparative gene prediction based on gene structure conservation
PRIB'06 Proceedings of the 2006 international conference on Pattern Recognition in Bioinformatics
ICIC'11 Proceedings of the 7th international conference on Intelligent Computing: bio-inspired computing and applications
PATMAP: polyadenylation site identification from next-generation sequencing data
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
Acceleration of the long read mapping on a PC-FPGA architecture (abstract only)
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Hi-index | 3.84 |
Motivation: We introduce gmap, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. Methodology underlying the program includes a minimal sampling strategy for genomic mapping, oligomer chaining for approximate alignment, sandwich DP for splice site detection, and microexon identification with statistical significance testing. Results: On a set of human messenger RNAs with random mutations at a 1 and 3% rate, gmap identified all splice sites accurately in over 99.3% of the sequences, which was one-tenth the error rate of existing programs. On a large set of human expressed sequence tags, gmap provided higher-quality alignments more often than blat did. On a set of Arabidopsis cDNAs, gmap performed comparably with GeneSeqer. In these experiments, gmap demonstrated a several-fold increase in speed over existing programs. Availability: Source code for gmap and associated programs is available at http://www.gene.com/share/gmap Contact: twu@gene.com Supplementary information: http://www.gene.com/share/gmap