P-tree classification of yeast gene deletion data

Authors:
Amal Perera;Anne Denton;Pratap Kotala;William Jockheck;Willy Valdivia Granda;William Perrizo
Affiliations:
North Dakota State University, IACC 258;North Dakota State University, IACC 258;North Dakota State University, IACC 258;North Dakota State University, IACC 258;North Dakota State University, Fargo, ND;North Dakota State University, Fargo, ND
Venue:
ACM SIGKDD Explorations Newsletter
Year:
2002

Citing 5
Cited 3

Decision tree classification of spatial data streams using Peano Count Trees

Proceedings of the 2002 ACM symposium on Applied computing
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
k-nearest Neighbor Classification on Spatial Data Streams Using P-trees

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Association Rule Mining on Remotely Sensed Images Using P-trees

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining

Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics

Machine Learning
Comprehensive vertical sample-based KNN/LSVM classification for gene expression analysis

Journal of Biomedical Informatics - Special issue: Biomedical machine learning
Parameter optimized, vertical, nearest-neighbor-vote and boundary-based classification

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Genomics data has many properties that make it different from "typical" relational data. The presence of multi-valued attributes as well as the large number of null values led us to a P-tree-based bit-vector representation in which matching 1-values were counted to evaluate similarity between genes. Quantitative information such as the number of interactions was also included in the classifier. Interaction information allowed us to extend the known properties of one protein with information on its interacting neighbors. Different feature attributes were weighted independently. Relevance of different attributes was systematically evaluated through optimization of weights using a genetic algorithm. The AROC value for the classified list was used as the fitness function for the genetic algorithm.