Identifying character non-independence in phylogenetic data using data mining techniques

Authors:
Anne M. Maglia;Jennifer L. Leopold;Venkat Ram Ghatti
Affiliations:
University of Missouri-Rolla, Rolla, MO;University of Missouri-Rolla, Rolla, MO;University of Missouri-Rolla, Rolla, MO
Venue:
APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Year:
2004

Citing 7
Cited 1

Rough classification

International Journal of Man-Machine Studies
C4.5: programs for machine learning

C4.5: programs for machine learning
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Data mining: concepts and techniques

Data mining: concepts and techniques
Managing Uncertainty in Expert Systems

Managing Uncertainty in Expert Systems
Induction of Decision Trees

Machine Learning
Learning equivalence classes of bayesian-network structures

The Journal of Machine Learning Research

Protein secondary structure prediction using rule induction from covering

CIBCB'09 Proceedings of the 6th Annual IEEE conference on Computational Intelligence in Bioinformatics and Computational Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Undiscovered relationships in a data set may confound analyses, particularly those that assume data independence. Such problems occur when characters used for phylogenetic analyses are not independent of one another. A main assumption of phylogenetic inference methods such as maximum likelihood and parsimony is that each character serves as an independent hypothesis of evolution. When this assumption is violated, the resulting phylogeny may not reflect true evolutionary history. Therefore, it is imperative that character non-independence be identified prior to phylogenetic analyses. To identify dependencies between phylogenetic characters, we applied three data mining techniques: 1) Bayesian networks, 2) decision tree induction, and 3) rule induction from coverings. We briefly discuss the main ideas behind each strategy, show how each technique performs on a small sample data set, and apply each method to an existing phylogenetic data set. We discuss the interestingness of the results of each method, and show that, although each method has its own strengths and weaknesses, rule induction from coverings presents the most useful solution for determining dependencies among phylogenetic data at this time.