SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On effective classification of strings with wavelets
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
An important problem for computer scientists as well as geneticists involves classifying particular items into common groups. This paper focuses on classifying sequences of DNA as either an intron or an exon. Insights from this classification can reduce the time needed for laboratory work to distinguish between introns and exons. Using a classification tree based on the SPRINT algorithm, sequences from the Drosophila melanogaster and the Caenorhabditis elegans genomes were used for training and testing. A large test sample error rate of 15% was shown for the Drosophila melanogaster, whereas the Caenorhabditis elegans was only 1.6%.