How many clusters are best?—an experiment
Pattern Recognition
Algorithms for clustering data
Algorithms for clustering data
Learning Based on Conceptual Distance
IEEE Transactions on Pattern Analysis and Machine Intelligence
Symbolic clustering using a new dissimilarity measure
Pattern Recognition
C4.5: programs for machine learning
C4.5: programs for machine learning
A conceptual version of the K-means algorithm
Pattern Recognition Letters
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Knowledge discovery in databases terminology
Advances in knowledge discovery and data mining
Pattern Recognition with Fuzzy Objective Function Algorithms
Pattern Recognition with Fuzzy Objective Function Algorithms
Experiments with Incremental Concept Formation: UNIMEM
Machine Learning
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A robust and scalable clustering algorithm for mixed type attributes in large database environment
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
FREM: fast and robust EM clustering for large data sets
Proceedings of the eleventh international conference on Information and knowledge management
An iterative initial-points refinement algorithm for categorical data clustering
Pattern Recognition Letters
Redefining Clustering for High-Dimensional Applications
IEEE Transactions on Knowledge and Data Engineering
On distributing the clustering process
Pattern Recognition Letters
The new k-windows algorithm for improving the k-means clustering algorithm
Journal of Complexity
Value Range Queries on Earth Science Data via Histogram Clustering
TSDM '00 Proceedings of the First International Workshop on Temporal, Spatial, and Spatio-Temporal Data Mining-Revised Papers
An Improved Recommendation Algorithm in Collaborative Filtering
EC-WEB '02 Proceedings of the Third International Conference on E-Commerce and Web Technologies
Extended K-means with an Efficient Estimation of the Number of Clusters
IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
A Tabu Search Based Algorithm for Clustering Categorical Data Sets
IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
An Interactive Approach to Building Classification Models by Clustering and Cluster Validation
IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
Evolutionary Hot Spots Data Mining - An Architecture for Exploring for Interesting Discoveries
PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
A Visual Method of Cluster Validation with Fastmap
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
M-FastMap: A Modified FastMap Algorithm for Visual Cluster Validation in Data Mining
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
On Data Clustering Analysis: Scalability, Constraints, and Validation
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Clustering Large Categorical Data
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Extending K-Means Clustering to First-Order Representations
ILP '00 Proceedings of the 10th International Conference on Inductive Logic Programming
A Cube Model and Cluster Analysis for Web Access Sessions
WEBKDD '01 Revised Papers from the Third International Workshop on Mining Web Log Data Across All Customers Touch Points
Clustering binary data streams with K-means
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Clustering Item Data Sets with Association-Taxonomy Similarity
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Fast and Robust General Purpose Clustering Algorithms
Data Mining and Knowledge Discovery
A data cube model for prediction-based web prefetching
Journal of Intelligent Information Systems - Special issue on web intelligence
Efficient Disk-Based K-Means Clustering for Relational Databases
IEEE Transactions on Knowledge and Data Engineering
Entropy-based criterion in categorical clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Fuzzy clustering of categorical data using fuzzy centroids
Pattern Recognition Letters
Subspace clustering for high dimensional categorical data
ACM SIGKDD Explorations Newsletter
Automated Variable Weighting in k-Means Type Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Categorical data visualization and clustering using subjective factors
Data & Knowledge Engineering
Clustering mixed numerical and low quality categorical data: significance metrics on a yeast example
Proceedings of the 2nd international workshop on Information quality in information systems
CLICKS: an effective algorithm for mining subspace clusters in categorical datasets
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
TCSOM: Clustering Transactions Using Self-Organizing Map
Neural Processing Letters
Post-processing clustering to reduce XCS variability
GECCO '05 Proceedings of the 7th annual workshop on Genetic and evolutionary computation
Labeling Unclustered Categorical Data into Clusters Based on the Important Attribute Values
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
An Integrated Framework for Visualized and Exploratory Pattern Discovery in Mixed Data
IEEE Transactions on Knowledge and Data Engineering
Computing LTS Regression for Large Data Sets
Data Mining and Knowledge Discovery
A Unified View on Clustering Binary Data
Machine Learning
Adherence clustering: an efficient method for mining market-basket clusters
Information Systems
A Bit Level Representation for Time Series Data Mining with Shape Based Similarity
Data Mining and Knowledge Discovery
Efficiently clustering transactional data with weighted coverage density
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Clustering large software systems at multiple layers
Information and Software Technology
A semi-supervised regression model for mixed numerical and categorical variables
Pattern Recognition
On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm
IEEE Transactions on Pattern Analysis and Machine Intelligence
A k-mean clustering algorithm for mixed numeric and categorical data
Data & Knowledge Engineering
Hierarchical clustering of mixed data based on distance hierarchy
Information Sciences: an International Journal
MMR: An algorithm for clustering categorical data using Rough Set Theory
Data & Knowledge Engineering
Strategies for Identifying Statistically Significant Dense Regions in Microarray Data
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data
IEEE Transactions on Knowledge and Data Engineering
A fuzzy k-partitions model for categorical data and its comparison to the GoM model
Fuzzy Sets and Systems
An adaptable deflect and conquer clustering algorithm
ACOS'07 Proceedings of the 6th Conference on WSEAS International Conference on Applied Computer Science - Volume 6
k-ANMI: A mutual information based clustering algorithm for categorical data
Information Fusion
Distance functions for categorical and mixed variables
Pattern Recognition Letters
Network snomaly detection based on semi-supervised clustering
SMO'07 Proceedings of the 7th WSEAS International Conference on Simulation, Modelling and Optimization
Mining categories for emails via clustering and pattern discovery
Journal of Intelligent Information Systems
Finding molecular complexes through multiple layer clustering of protein interaction networks
International Journal of Bioinformatics Research and Applications
Bi-level clustering of mixed categorical and numerical biomedical data
International Journal of Data Mining and Bioinformatics
Incremental clustering of mixed data based on distance hierarchy
Expert Systems with Applications: An International Journal
Mining typical patterns from databases
Information Sciences: an International Journal
On clustering tree structured data with categorical nature
Pattern Recognition
A Bounded Index for Cluster Validity
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
A stroll with Carletto: adaptation in drama-based tours with virtual characters
User Modeling and User-Adapted Interaction
Determining the best K for clustering transactional datasets: A coverage density-based approach
Data & Knowledge Engineering
Improving Prediction Quality in Collaborative Filtering Based on Clustering
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
A comprehensive validity index for clustering
Intelligent Data Analysis
Multifractal-based cluster hierarchy optimisation algorithm
International Journal of Business Intelligence and Data Mining
Constraint-based clustering and its applications in construction management
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
Efficient layered density-based clustering of categorical data
Journal of Biomedical Informatics
A method for improving the accuracy of data mining classification algorithms
Computers and Operations Research
A new initialization method for categorical data clustering
Expert Systems with Applications: An International Journal
Models for association rules based on clustering and correlation
Intelligent Data Analysis
Effective spatial clustering methods for optimal facility establishment
Intelligent Data Analysis
A spectral-based clustering algorithm for categorical data using data summaries
Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors
An Outlier Detection Algorithm Based on Arbitrary Shape Clustering
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Context-Based Distance Learning for Categorical Data Clustering
IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
From comparing clusterings to combining clusterings
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Computation of initial modes for K-modes clustering algorithm using evidence accumulation
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Non-segmented Document Clustering Using Self-Organizing Map and Frequent Max Substring Technique
ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Shadowed c-means: Integrating fuzzy and rough clustering
Pattern Recognition
Adherence clustering: an efficient method for mining market-basket clusters
Information Systems
SCALE: a scalable framework for efficiently clustering transactional data
Data Mining and Knowledge Discovery
FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Multiobjective genetic algorithm-based fuzzy clustering of categorical attributes
IEEE Transactions on Evolutionary Computation
G-ANMI: A mutual information based genetic clustering algorithm for categorical data
Knowledge-Based Systems
A rough set approach for selecting clustering attribute
Knowledge-Based Systems
SKM-SNP: SNP markers detection method
Journal of Biomedical Informatics
AGRID: an efficient algorithm for clustering large high-dimensional datasets
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Post-processing clustering to decrease variability in XCS induced rulesets
IWLCS'03-05 Proceedings of the 2003-2005 international conference on Learning classifier systems
Fuzzy clustering based ad recommendation for TV programs
EuroITV'07 Proceedings of the 5th European conference on Interactive TV: a shared experience
Hierarchical density-based clustering of categorical data and a simplification
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
K-centers algorithm for clustering mixed type data
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Modified fuzzy c-means for ordinal valued attributes with particle swarm for optimization
Fuzzy Sets and Systems
Efficient k-anonymization using clustering techniques
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Quantization-based clustering algorithm
Pattern Recognition
Data mining on multimedia data
Data mining on multimedia data
Efficient outlier detection algorithm for heterogeneous data streams
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5
Enhancing principal direction divisive clustering
Pattern Recognition
Clustering with feature order preferences
Intelligent Data Analysis - Artificial Intelligence
A data labeling method for clustering categorical data
Expert Systems with Applications: An International Journal
Approximation algorithms for k-modes clustering
ICIC'06 Proceedings of the 2006 international conference on Intelligent computing: Part II
A framework for clustering categorical time-evolving data
IEEE Transactions on Fuzzy Systems
Distance-based outlier detection: consolidation and renewed bearing
Proceedings of the VLDB Endowment
DK-BKM: decremental K belief K-modes method
SUM'10 Proceedings of the 4th international conference on Scalable uncertainty management
Clustering categorical data using an extended modularity measure
ICONIP'10 Proceedings of the 17th international conference on Neural information processing: models and applications - Volume Part II
A case based reasoning approach on supplier selection in petroleum enterprises
Expert Systems with Applications: An International Journal
Integrating data mining with KJ method to classify bridge construction defects
Expert Systems with Applications: An International Journal
A new-fangled FES-k-Means clustering algorithm for disease discovery and visual analytics
EURASIP Journal on Bioinformatics and Systems Biology
Expert Systems with Applications: An International Journal
Active learning and subspace clustering for anomaly detection
Intelligent Data Analysis
XML data clustering: An overview
ACM Computing Surveys (CSUR)
Clustering the internet topology at the AS-level
SMO'05 Proceedings of the 5th WSEAS international conference on Simulation, modelling and optimization
Personalized web recommendation based on path clustering
ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Aggregate distance based clustering using fibonacci series-FIBCLUS
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Enhancing grid-density based clustering for high dimensional data
Journal of Systems and Software
Agents, clusters and components: A synergistic approach to the GSP
Future Generation Computer Systems
A novel ant-based clustering algorithm using the kernel method
Information Sciences: an International Journal
Semi-supervised parameter-free divisive hierarchical clustering of categorical data
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
INFORMS Journal on Computing
INCONCO: interpretable clustering of numerical and categorical objects
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
DISC: data-intensive similarity measure for categorical data
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
SpectralCAT: Categorical spectral clustering of numerical and nominal data
Pattern Recognition
Partitioning hard clustering algorithms based on multiple dissimilarity matrices
Pattern Recognition
Supervised visual vocabulary with category information
ACIVS'11 Proceedings of the 13th international conference on Advanced concepts for intelligent vision systems
Applying variable precision rough set model for clustering student suffering study's anxiety
Expert Systems with Applications: An International Journal
A new possibilistic clustering method: the possibilistic K-modes
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Ranking-based feature selection method for dynamic belief clustering
ICAIS'11 Proceedings of the Second international conference on Adaptive and intelligent systems
Content aggregation on knowledge bases using graph clustering
ESWC'06 Proceedings of the 3rd European conference on The Semantic Web: research and applications
Clustering mixed type attributes in large dataset
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
A mixture model based markov random field for discovering patterns in sequences
SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
Improving k-modes algorithm considering frequencies of attribute values in mode
CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
An extension of self-organizing maps to categorical data
EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Modified adaptive resonance theory network for mixed data based on distance hierarchy
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
A dissimilarity measure for the k-Modes clustering algorithm
Knowledge-Based Systems
Clustering approach using belief function theory
AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
Clustering mixed data based on evidence accumulation
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
DHCC: Divisive hierarchical clustering of categorical data
Data Mining and Knowledge Discovery
From Context to Distance: Learning Dissimilarity for Categorical Data Clustering
ACM Transactions on Knowledge Discovery from Data (TKDD)
A genetic k-modes algorithm for clustering categorical data
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Clustering categorical data using coverage density
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Kernel k-means for categorical data
IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Determining the number of clusters using information entropy for mixed data
Pattern Recognition
A cluster centers initialization method for clustering categorical data
Expert Systems with Applications: An International Journal
A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data
Knowledge-Based Systems
Personalized web recommendation based on path clustering
FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Co-clustering for binary data with maximum modularity
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Weighted topological clustering for categorical data
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part I
Integrative parameter-free clustering of data with mixed type attributes
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
A new clustering algorithm based on k-means using a line segment as prototype
CIARP'11 Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Clustering of heterogeneously typed data with soft computing - a case study
MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
Algorithm for fuzzy clustering of mixed data with numeric and categorical attributes
ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
Partitive clustering (K-means family)
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
An efficient clustering algorithm based on histogram threshold
ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part II
Generalizing the k-Windows clustering algorithm in metric spaces
Mathematical and Computer Modelling: An International Journal
Group RFM analysis as a novel framework to discover better customer consumption behavior
Expert Systems with Applications: An International Journal
Attribute value weighting in k-modes clustering
Expert Systems with Applications: An International Journal
Clustering categorical data streams
Journal of Computational Methods in Sciences and Engineering
Dependency clustering across measurement scales
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering urban spatial-temporal structure from human activity patterns
Proceedings of the ACM SIGKDD International Workshop on Urban Computing
LEFT-logical expressions feature transformation: a framework for transformation of symbolic features
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
Semantically-grounded construction of centroids for datasets with textual attributes
Knowledge-Based Systems
Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
A modification of the k-means method for quasi-unsupervised learning
Knowledge-Based Systems
Knowledge augmentation via incremental clustering: new technology for effective knowledge management
International Journal of Business Information Systems
Clustering heterogeneous data with mutual semi-supervision
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
HTTP: a new framework for bus travel time prediction based on historical trajectories
Proceedings of the 20th International Conference on Advances in Geographic Information Systems
A bio inspired fuzzy k-modes clustring algorithm
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Clustering based on rank distance with applications on DNA
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V
Clustering and labeling of multi-dimensional mixed structured data
Search Computing
A novel fuzzy clustering algorithm with between-cluster information for categorical data
Fuzzy Sets and Systems
Rough Set Based Clustering Using Active Learning Approach
International Journal of Artificial Life Research
RPKM: the rough possibilistic k-modes
ISMIS'12 Proceedings of the 20th international conference on Foundations of Intelligent Systems
ASCCN: Arbitrary Shaped Clustering Method with Compatible Nucleoids
International Journal of Data Warehousing and Mining
Hamming Distance based Clustering Algorithm
International Journal of Information Retrieval Research
Rough set based fuzzy k-modes for categorical data
SEMCCO'12 Proceedings of the Third international conference on Swarm, Evolutionary, and Memetic Computing
An improved genetic clustering algorithm for categorical data
PAKDD'12 Proceedings of the 2012 Pacific-Asia conference on Emerging Trends in Knowledge Discovery and Data Mining
A novel ant-based clustering algorithm using Renyi entropy
Applied Soft Computing
New cluster ensemble approach to integrative biological data analysis
International Journal of Data Mining and Bioinformatics
Novel class detection within classification for data streams
ISNN'13 Proceedings of the 10th international conference on Advances in Neural Networks - Volume Part II
Finite mixtures of unimodal beta and gamma densities and the $$k$$-bumps algorithm
Computational Statistics
Stock market co-movement assessment using a three-phase clustering method
Expert Systems with Applications: An International Journal
MAGE: A semantics retaining K-anonymization method for mixed data
Knowledge-Based Systems
Data integration techniques for the measurement of the reliability of sample variables
International Journal of Business Intelligence and Data Mining
Classifying and clustering in negative databases
Frontiers of Computer Science: Selected Publications from Chinese Universities
International Journal of Hybrid Intelligent Systems
Hi-index | 0.02 |
The k-means algorithm is well known for its efficiency in clusteringlarge data sets. However, working only on numeric values prohibits itfrom being used to cluster real world data containingcategorical values. In this paper we present two algorithms whichextend the k-means algorithm to categorical domains and domains withmixed numeric and categorical values. The k-modes algorithm uses asimple matching dissimilarity measure to deal with categoricalobjects, replaces the means of clusters with modes, and uses afrequency-based method to update modes in the clustering process tominimise the clustering cost function. With these extensions thek-modes algorithm enables the clustering of categorical data in afashion similar to k-means. The k-prototypes algorithm, throughthe definition of a combined dissimilarity measure, further integratesthe k-means and k-modes algorithms to allow for clustering objectsdescribed by mixed numeric and categorical attributes. We use the wellknown soybean disease and credit approval data setsto demonstrate the clustering performance of the two algorithms. Ourexperiments on two real world data sets with half a million objectseach show that the two algorithms are efficient when clustering largedata sets, which is critical to data mining applications.