Hierarchical mixtures of experts and the EM algorithm
Neural Computation
Statistical physics, mixtures of distributions, and the EM algorithm
Neural Computation
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The KDD process for extracting useful knowledge from volumes of data
Communications of the ACM
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A unifying review of linear Gaussian models
Neural Computation
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerating exact k-means algorithms with geometric reasoning
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SQLEM: fast clustering in SQL using the EM algorithm
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SMEM algorithm for mixture models
Proceedings of the 1998 conference on Advances in neural information processing systems II
Scalability for clustering algorithms revisited
ACM SIGKDD Explorations Newsletter
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data bubbles: quality preserving performance boosting for hierarchical clustering
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
A Fast Algorithm to Cluster High Dimensional Basket Data
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining Constrained Association Rules to Predict Heart Disease
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
C2P: Clustering based on Closest Pairs
Proceedings of the 27th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
On Convergence Properties of the EM Algorithm for Gaussian Mixtures
On Convergence Properties of the EM Algorithm for Gaussian Mixtures
Mining complex databases using the EM algorithm
Mining complex databases using the EM algorithm
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
On-line EM Algorithm for the Normalized Gaussian Network
Neural Computation
Clustering binary data streams with K-means
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Horizontal aggregations for building tabular data sets
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Efficient Disk-Based K-Means Clustering for Relational Databases
IEEE Transactions on Knowledge and Data Engineering
Programming the K-means clustering algorithm in SQL
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Gradual Model Generator for Single-Pass Clustering
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Integrating K-Means Clustering with a Relational DBMS Using SQL
IEEE Transactions on Knowledge and Data Engineering
Adherence clustering: an efficient method for mining market-basket clusters
Information Systems
Effective document clustering for large heterogeneous law firm collections
ICAIL '05 Proceedings of the 10th international conference on Artificial intelligence and law
Vector and matrix operations programmed with UDFs in a relational DBMS
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Gradual model generator for single-pass clustering
Pattern Recognition
Network anomaly detection with incomplete audit data
Computer Networks: The International Journal of Computer and Telecommunications Networking
A convergence theorem for the fuzzy subspace clustering (FSC) algorithm
Pattern Recognition
Data Set Homeomorphism Transformation Based Meta-clustering
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Adherence clustering: an efficient method for mining market-basket clusters
Information Systems
Legal document clustering with built-in topic segmentation
Proceedings of the 20th ACM international conference on Information and knowledge management
Autonomous and deterministic probabilistic neural network using global k-means
ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
Leveraging network structure for incremental document clustering
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Journal of Network and Computer Applications
Hi-index | 0.00 |
Clustering is a fundamental Data Mining technique. This article presents an improved EM algorithm to cluster large data sets having high dimensionality, noise and zero variance problems. The algorithm incorporates improvements to increase the quality of solutions and speed. In general the algorithm can find a good clustering solution in 3 scans over the data set. Alternatively, it can be run until it converges. The algorithm has a few parameters that are easy to set and have defaults for most cases. The proposed algorithm is compared against the standard EM algorithm and the On-Line EM algorithm.