Sequence classification using statistical pattern recognition
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
A plan classifier based on Chi-square distribution tests
Intelligent Data Analysis
International Journal of Organizational and Collective Intelligence
Hi-index | 0.00 |
Scientific data classification is the activity of determining whether or not an unlabeled scientific object belongs to an existing class. It is an important operation in the management of scientific databases. In this paper we present a case study for scientific data classification. Specifically, we develop a tool for DNA sequence classification. The tool works by generating and matching gapped fingerprints of DNA sequences. Experimental results obtained by applying our tool to classifying a set of Alu sequences demonstrate the good performance of the tool. While the reported research focuses on DNA classification, our techniques should generalize to any domain (e.g. multimedia) where data are naturally represented as sequences.