Algorithms for clustering data
Algorithms for clustering data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Using the fractal dimension to cluster datasets
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Information Theoretic Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Requirements for clustering data streams
ACM SIGKDD Explorations Newsletter
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
Machine Learning
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Clustering Categorical Data: An Approach Based on Dynamical Systems
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Bootstrapping a data mining intrusion detection system
Proceedings of the 2003 ACM symposium on Applied computing
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Entropy-based criterion in categorical clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Organizing structured web sources by query schemas: a clustering approach
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Compression schemes for differential categorical stream clustering
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A divide-and-merge methodology for clustering
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
CLICKS: an effective algorithm for mining subspace clusters in categorical datasets
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
TCSOM: Clustering Transactions Using Self-Organizing Map
Neural Processing Letters
An Integrated Framework for Visualized and Exploratory Pattern Discovery in Mixed Data
IEEE Transactions on Knowledge and Data Engineering
A Unified View on Clustering Binary Data
Machine Learning
Adherence clustering: an efficient method for mining market-basket clusters
Information Systems
CLUC: a natural clustering algorithm for categorical datasets based on cohesion
Proceedings of the 2006 ACM symposium on Applied computing
Projected clustering for categorical datasets
Pattern Recognition Letters
Efficiently clustering transactional data with weighted coverage density
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A divide-and-merge methodology for clustering
ACM Transactions on Database Systems (TODS)
Clicks: An effective algorithm for mining subspace clusters in categorical datasets
Data & Knowledge Engineering
Hierarchical clustering of mixed data based on distance hierarchy
Information Sciences: an International Journal
Strategies for Identifying Statistically Significant Dense Regions in Microarray Data
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data
IEEE Transactions on Knowledge and Data Engineering
k-ANMI: A mutual information based clustering algorithm for categorical data
Information Fusion
A hierarchical model-based approach to co-clustering high-dimensional data
Proceedings of the 2008 ACM symposium on Applied computing
Incremental clustering of mixed data based on distance hierarchy
Expert Systems with Applications: An International Journal
Categorical Data Clustering Using the Combinations of Attribute Values
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Data Reduction Method for Categorical Data Clustering
IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Determining the best K for clustering transactional datasets: A coverage density-based approach
Data & Knowledge Engineering
“Best K”: critical clustering structures in categorical datasets
Knowledge and Information Systems
A spectral-based clustering algorithm for categorical data using data summaries
Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors
Clustering with Domain Value Dissimilarity for Categorical Data
ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Context-Based Distance Learning for Categorical Data Clustering
IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Computational Statistics & Data Analysis
Ontology based schema matching and mapping approach for structured databases
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
K-means clustering versus validation measures: a data-distribution perspective
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
HE-Tree: a framework for detecting changes in clustering structure for categorical data streams
The VLDB Journal — The International Journal on Very Large Data Bases
Reusable components for partitioning clustering algorithms
Artificial Intelligence Review
ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part I
Adherence clustering: an efficient method for mining market-basket clusters
Information Systems
SCALE: a scalable framework for efficiently clustering transactional data
Data Mining and Knowledge Discovery
Hierarchical density-based clustering of categorical data and a simplification
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
A new initialization method for clustering categorical data
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Modified fuzzy c-means for ordinal valued attributes with particle swarm for optimization
Fuzzy Sets and Systems
Clustering with feature order preferences
Intelligent Data Analysis - Artificial Intelligence
Online entropy-based model of lexical category acquisition
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Clustering categorical data using an extended modularity measure
ICONIP'10 Proceedings of the 17th international conference on Neural information processing: models and applications - Volume Part II
Pattern Recognition Letters
Aggregate distance based clustering using fibonacci series-FIBCLUS
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
INFORMS Journal on Computing
SpectralCAT: Categorical spectral clustering of numerical and nominal data
Pattern Recognition
A practical approach for clustering transaction data
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
A diversity measure leveraging domain specific auxiliary information
Proceedings of the 20th ACM international conference on Information and knowledge management
Clustering based on compressed data for categorical and mixed attributes
SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Incremental clustering based on swarm intelligence
SEAL'06 Proceedings of the 6th international conference on Simulated Evolution And Learning
Modified adaptive resonance theory network for mixed data based on distance hierarchy
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
A dissimilarity measure for the k-Modes clustering algorithm
Knowledge-Based Systems
CPCQ: Contrast pattern based clustering quality index for categorical data
Pattern Recognition
DHCC: Divisive hierarchical clustering of categorical data
Data Mining and Knowledge Discovery
From Context to Distance: Learning Dissimilarity for Categorical Data Clustering
ACM Transactions on Knowledge Discovery from Data (TKDD)
Kernel k-means for categorical data
IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Determining the number of clusters using information entropy for mixed data
Pattern Recognition
Clustering structured web sources: a schema-based, model-differentiation approach
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
A cluster centers initialization method for clustering categorical data
Expert Systems with Applications: An International Journal
A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data
Knowledge-Based Systems
Clustering of heterogeneously typed data with soft computing - a case study
MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
Identification of surgical practice patterns using evolutionary cluster analysis
Mathematical and Computer Modelling: An International Journal
Event-based classification of social media streams
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
A self-organizing map for transactional data and the related categorical domain
Applied Soft Computing
Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
A novel fuzzy clustering algorithm with between-cluster information for categorical data
Fuzzy Sets and Systems
An improved genetic clustering algorithm for categorical data
PAKDD'12 Proceedings of the 2012 Pacific-Asia conference on Emerging Trends in Knowledge Discovery and Data Mining
Detection of HTTP-GET attack with clustering and information theoretic measurements
FPS'12 Proceedings of the 5th international conference on Foundations and Practice of Security
Semantic Pattern Transformation: Applying Knowledge Discovery Processes in Heterogeneous Domains
Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies
Review: A review of novelty detection
Signal Processing
Hi-index | 0.00 |
In this paper we explore the connection between clustering categorical data and entropy: clusters of similar poi lower entropy than those of dissimilar ones. We use this connection to design an incremental heuristic algorithm, COOLCAT, which is capable of efficiently clustering large data sets of records with categorical attributes, and data streams. In contrast with other categorical clustering algorithms published in the past, COOLCAT's clustering results are very stable for different sample sizes and parameter settings. Also, the criteria for clustering is a very intuitive one, since it is deeply rooted on the well-known notion of entropy. Most importantly, COOLCAT is well equipped to deal with clustering of data streams(continuously arriving streams of data point) since it is an incremental algorithm capable of clustering new points without having to look at every point that has been clustered so far. We demonstrate the efficiency and scalability of COOLCAT by a series of experiments on real and synthetic data sets.