Software note: ECS: An automatic enzyme classifier based on functional domain composition
Computational Biology and Chemistry
An entropy based heuristic model for predicting functional sub-type divisions of protein families
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
SOM-PORTRAIT: Identifying Non-coding RNAs Using Self-Organizing Maps
BSB '09 Proceedings of the 4th Brazilian Symposium on Bioinformatics: Advances in Bioinformatics and Computational Biology
Molecular evolution of the N-acetylglucosamine-6-phosphate deacetylase gene
ISB '10 Proceedings of the International Symposium on Biocomputing
Replication heuristics for efficient workflow execution on grids
OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems - Volume Part I
SIMCOMP: a hybrid soft clustering of metagenome reads
PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics
Multi-neighborhood search for discrimination of signal peptides and transmembrane segments
EvoBIO'11 Proceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Predicting Metal-Binding Sites from Protein Sequence
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A local search appproach for transmembrane segment and signal peptide discrimination
EvoBIO'10 Proceedings of the 8th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Phylogenetic and mutational analysis of 2008, 2009 & 2010 neuraminidase of H1N1 influenza A
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Computational Biology and Chemistry
Prediction of methylation CpGs and their methylation degrees in human DNA sequences
Computers in Biology and Medicine
Computers in Biology and Medicine
Predicting linear B-cell epitopes by using sequence-derived structural and physicochemical features
International Journal of Data Mining and Bioinformatics
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
A data parallel strategy for aligning multiple biological sequences on multi-core computers
Computers in Biology and Medicine
A linear inside-outside algorithm for correcting sequencing errors in structured RNAs
RECOMB'13 Proceedings of the 17th international conference on Research in Computational Molecular Biology
International Journal of Data Mining and Bioinformatics
High performance computing workflow for protein functional annotation
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
An insight into the molecular basis for convergent evolution in fish antifreeze Proteins
Computers in Biology and Medicine
Computational Biology and Chemistry
Hi-index | 3.84 |
Motivation: In 2001 and 2002, we published two papers (Bioinformatics, 17, 282--283, Bioinformatics, 18, 77--82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST. Availability: http://cd-hit.org Contact: liwz@sdsc.edu