Experimental Investigation of Three Machine Learning Algorithms for ITS Dataset

Authors:
J. L. Yearwood;B. H. Kang;A. V. Kelarev
Affiliations:
School of Information Technology and Mathematical Sciences, University of Ballarat, Ballarat, Victoria, Australia 3353;School of Computing and Information Systems, University of Tasmania, Tasmania, Australia 7001;School of Information Technology and Mathematical Sciences, University of Ballarat, Ballarat, Victoria, Australia 3353
Venue:
FGIT '09 Proceedings of the 1st International Conference on Future Generation Information Technology
Year:
2009

Citing 7
Cited 0

Bioinformatics: the machine learning approach

Bioinformatics: the machine learning approach
State of the art of graph-based data mining

ACM SIGKDD Explorations Newsletter
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A constraint-based evolutionary learning approach to the expectation maximization for optimal estimation of the hidden Markov model for speech signal modeling

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
A variable initialization approach to the EM algorithm for better estimation of the parameters of hidden markov model based acoustic modeling of speech signals

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
A new model for classifying DNA code inspired by neural networks and FSA

PKAW'06 Proceedings of the 9th Pacific Rim Knowledge Acquisition international conference on Advances in Knowledge Acquisition and Management
Clustering algorithms for ITS sequence data with alignment metrics

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The present article is devoted to experimental investigation of the performance of three machine learning algorithms for ITS dataset in their ability to achieve agreement with classes published in the biologi cal literature before. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form a Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel machine learning ap proaches to the analysis of datasets of this sort. This paper introduces a k-committees classifier and compares it with the discrete k-means and Nearest Neighbour classifiers. It turns out that all three machine learning algorithms are efficient and can be used to automate future biologically significant classifications for datasets of this kind. A simplified version of a synthetic dataset, where the k-committees classifier outperforms k-means and Nearest Neighbour classifiers, is also presented.