C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
An Ontology-Driven Clustering Method for Supporting Gene Expression Analysis
CBMS '05 Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems
Correlation between Gene Expression and GO Semantic Similarity
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Incorporating Gene Ontology in Clustering Gene Expression Data
CBMS '06 Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Proceedings of the 2007 ACM symposium on Applied computing
Distance Learning for Similarity Estimation
IEEE Transactions on Pattern Analysis and Machine Intelligence
A review of feature selection techniques in bioinformatics
Bioinformatics
Using Gene Ontology to Enhance Effectiveness of Similarity Measures for Microarray Data
BIBM '08 Proceedings of the 2008 IEEE International Conference on Bioinformatics and Biomedicine
Using information content to evaluate semantic similarity in a taxonomy
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
The optimal distance measure for object detection
CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
The optimal distance measure for nearest neighbor classification
IEEE Transactions on Information Theory
Hi-index | 0.00 |
In this work, a novel data representation for learning from gene expression data is introduced, which is aimed at emphasizing gene-gene interactions in learning. With this representation, the data simply comprise differences in the expression values of gene pairs and not the expression values themselves. An important benefit of this representation, except the better sensitivity to gene interactions, is the opportunity to incorporate external knowledge in the form of semantic similarity corresponding to the pairs, which is also studied. In this context, two common learning algorithms, plain k-NN classification and Random Forest are compared with two distance function learning-based techniques, learning from equivalence constraints and the intrinsic Random Forest similarity on a set of genetic benchmark datasets. The most discriminative gene pairs are selected and the new representation is evaluated on the benchmark data. The novel representation is shown to increase classification accuracy for genetic datasets. Exploiting the gene-pair representation and the Gene Ontology GO, the semantic similarity of gene pairs is calculated and used to pre-select pairs with a high similarity value. The GO-based feature selection approach is compared to the common feature selection and is shown to often increase the classification accuracy.