Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
ROCK: a robust clustering algorithm for categorical attributes
Information Systems
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications
Data Mining and Knowledge Discovery
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Techniques of Cluster Algorithms in Data Mining
Data Mining and Knowledge Discovery
Clustering Categorical Data: An Approach Based on Dynamical Systems
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
CLOPE: a fast and effective clustering algorithm for transactional data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Data
BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
Adaptive dimension reduction for clustering high dimensional data
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Entropy-based criterion in categorical clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
CLICKS: Mining Subspace Clusters in Categorical Data via K-Partite Maximal Cliques
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A framework for projected clustering of high dimensional data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Defining clusters from a hierarchical cluster tree
Bioinformatics
Finding molecular complexes through multiple layer clustering of protein interaction networks
International Journal of Bioinformatics Research and Applications
Database indexing for production MegaBLAST searches
Bioinformatics
Hierarchical density-based clustering of categorical data and a simplification
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
A challenge involved in applying density-based clustering to categorical biomedical data is that the ''cube'' of attribute values has no ordering defined, making the search for dense subspaces slow. We propose the HIERDENC algorithm for hierarchical density-based clustering of categorical data, and a complementary index for searching for dense subspaces efficiently. The HIERDENC index is updated when new objects are introduced, such that clustering does not need to be repeated on all objects. The updating and cluster retrieval are efficient. Comparisons with several other clustering algorithms showed that on large datasets HIERDENC achieved better runtime scalability on the number of objects, as well as cluster quality. By fast collapsing the bicliques in large networks we achieved an edge reduction of as much as 86.5%. HIERDENC is suitable for large and quickly growing datasets, since it is independent of object ordering, does not require re-clustering when new data emerges, and requires no user-specified input parameters.