A framework for ontology-driven subspace clustering

Authors:
Jinze Liu;Wei Wang;Jiong Yang
Affiliations:
University of North Carolina at Chapel Hill, Chapel Hill, NC;University of North Carolina at Chapel Hill, Chapel Hill, NC;University of Illinois, Urbana-Champaign, IL
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 9
Cited 10

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Discovering local structure in gene expression data: the order-preserving submatrix problem

Proceedings of the sixth annual international conference on Computational biology
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Semantic Compression and Pattern Extraction with Fascicles

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases

Comparing Subspace Clusterings

IEEE Transactions on Knowledge and Data Engineering
Hierarchical, Parameter-Free Community Discovery

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Onto-clust-A methodology for combining clustering analysis and ontological methods for identifying groups of comorbidities for developmental disorders

Journal of Biomedical Informatics
Fuzzy c-means clustering with prior biological knowledge

Journal of Biomedical Informatics
Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Data clustering: 50 years beyond K-means

Pattern Recognition Letters
Mining the “Voice of the Customer” for Business Prioritization

ACM Transactions on Intelligent Systems and Technology (TIST)
Clustering large collection of biomedical literature based on ontology-enriched bipartite graph representation and mutual refinement strategy

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
MicroClAn: Microarray clustering analysis

Journal of Parallel and Distributed Computing
Comparative meta-analysis between human and mouse cancer microarray data reveals critical pathways

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional clustering is a descriptive task that seeks to identify homogeneous groups of objects based on the values of their attributes. While domain knowledge is always the best way to justify clustering, few clustering algorithms have ever take domain knowledge into consideration. In this paper, the domain knowledge is represented by hierarchical ontology. We develop a framework by directly incorporating domain knowledge into clustering process, yielding a set of clusters with strong ontology implication. During the clustering process, ontology information is utilized to efficiently prune the exponential search space of the subspace clustering algorithms. Meanwhile, the algorithm generates automatical interpretation of the clustering result by mapping the natural hierarchical organized subspace clusters with significant categorical enrichment onto the ontology hierarchy. Our experiments on a set of gene expression data using gene ontology demonstrate that our pruning technique driven by ontology significantly improve the clustering performance with minimal degradation of the cluster quality. Meanwhile, many hierarchical organizations of gene clusters corresponding to a sub-hierarchies in gene ontology were also successfully captured.