Data clustering: 50 years beyond K-means

Authors:
Anil K. Jain
Affiliations:
Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan 48824, USA and Department of Brain and Cognitive Engineering, Korea University, Anam-dong, Seoul, ...
Venue:
Pattern Recognition Letters
Year:
2010

Citing 66
Cited 148

Algorithms for clustering data

Algorithms for clustering data
An Eigendecomposition Approach to Weighted Graph Matching Problems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Cluster-based text categorization: a comparison of category search strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
A Robust Competitive Clustering Algorithm With Applications in Computer Vision

IEEE Transactions on Pattern Analysis and Machine Intelligence
Accelerating exact k-means algorithms with geometric reasoning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Very fast EM-based mixture model clustering using multiresolution kd-trees

Proceedings of the 1998 conference on Advances in neural information processing systems II
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Data mining: concepts and techniques

Data mining: concepts and techniques
Writer Adaptation for Online Handwriting Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Learning of Finite Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pairwise Data Clustering by Deterministic Annealing

IEEE Transactions on Pattern Analysis and Machine Intelligence
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Data Clustering Using Evidence Accumulation

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Using machine learning to improve information access

Using machine learning to improve information access
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets

IEEE Transactions on Knowledge and Data Engineering
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Latent dirichlet allocation

The Journal of Machine Learning Research
A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
Multiclass Spectral Clustering

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Ontologies Improve Text Document Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Discovery of climate indices using clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Stability-based validation of clustering solutions

Neural Computation
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Clustering Large Graphs via the Singular Value Decomposition

Machine Learning
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Kernel k-means: spectral clustering and normalized cuts

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for ontology-driven subspace clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Landscape of Clustering Algorithms

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Learning with Constrained and Unlabelled Data

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Multi-way distributional clustering via pairwise interactions

ICML '05 Proceedings of the 22nd international conference on Machine learning
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
The uniqueness of a good optimum for K-means

ICML '06 Proceedings of the 23rd international conference on Machine learning
Clustering graphs by weighted substructure mining

ICML '06 Proceedings of the 23rd international conference on Machine learning
Group formation in large social networks: membership, growth, and evolution

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Dominant Sets and Pairwise Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering with Bregman Divergences

The Journal of Machine Learning Research
Using Multivariate Statistics (5th Edition)

Using Multivariate Statistics (5th Edition)
Penalized Probabilistic Clustering

Neural Computation
Cluster analysis of heterogeneous rank data

Proceedings of the 24th international conference on Machine learning
Clustering short texts using wikipedia

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
BoostCluster: boosting clustering by pairwise constraints

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A Nonlinear Mapping for Data Structure Analysis

IEEE Transactions on Computers
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Statistical methods for automated generation of service engagement staffing plans

IBM Journal of Research and Development - Business optimization
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Constrained Clustering: Advances in Algorithms, Theory, and Applications

Constrained Clustering: Advances in Algorithms, Theory, and Applications
A Scalable Framework For Segmenting Magnetic Resonance Images

Journal of Signal Processing Systems
A scalable framework for cluster ensembles

Pattern Recognition
Probabilistic classification and clustering in relational data

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Testing for Uniformity in Multidimensional Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Conceptual clustering in information retrieval

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Fast accurate fuzzy clustering through data reduction

IEEE Transactions on Fuzzy Systems
Least squares quantization in PCM

IEEE Transactions on Information Theory
A self-organizing network for hyperellipsoidal clustering (HEC)

IEEE Transactions on Neural Networks

SEP/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index

Pattern Recognition
Boosting Clustering by Active Constraint Selection

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Large-scale robust visual codebook construction

Proceedings of the international conference on Multimedia
Non-parametric mixture models for clustering

SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
Multiple hypothesis video segmentation from superpixel flows

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Automatically finding clusters in normalized cuts

Pattern Recognition
APSCAN: A parameter free algorithm for clustering

Pattern Recognition Letters
An approach for multi-objective categorization based on the game theory and Markov process

Applied Soft Computing
Rough entropy hierarchical agglomerative clustering in image segmentation

Transactions on rough sets XIII
sub-space clustering and evidence accumulation for unsupervised network anomaly detection

TMA'11 Proceedings of the Third international conference on Traffic monitoring and analysis
Beyond classical consensus clustering: The least squares approach to multiple solutions

Pattern Recognition Letters
UNADA: unsupervised network anomaly detection using sub-space outliers ranking

NETWORKING'11 Proceedings of the 10th international IFIP TC 6 conference on Networking - Volume Part I
Comparison of clustering methods: A case study of text-independent speaker modeling

Pattern Recognition Letters
Improving DBSCAN's execution time by using a pruning technique on bit vectors

Pattern Recognition Letters
Accelerating kernel neural gas

ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
Multi-task clustering via domain adaptation

Pattern Recognition
A graph partitioning approach to SOM clustering

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
A latent variable pairwise classification model of a clustering ensemble

MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Helping users sort faster with adaptive machine learning recommendations

INTERACT'11 Proceedings of the 13th IFIP TC 13 international conference on Human-computer interaction - Volume Part III
MINETRAC: mining flows for unsupervised analysis & semi-supervised classification

Proceedings of the 23rd International Teletraffic Congress
An improved sequential clustering algorithm

AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part I
Fuzzy clustering based on generalized entropy and its application to image segmentation

AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part II
CLICOM: Cliques for combining multiple clusterings

Expert Systems with Applications: An International Journal
A new clustering method and its application in social networks

Pattern Recognition Letters
Heterogeneous driver behavior state recognition using speech signal

ICOSSSE'11 Proceedings of the 10th WSEAS international conference on System science and simulation in engineering
Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering

Pattern Recognition
Classification of 3-D objects and faces employing view-based clusters

Computers and Electrical Engineering
Improving constrained clustering with active query selection

Pattern Recognition
Using structure-based data transformation method to improve prediction accuracies for small data sets

Decision Support Systems
A two-stage genetic algorithm for automatic clustering

Neurocomputing
SIC-means: a semi-fuzzy approach for clustering data streams using c-means

ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Determining the number of clusters using information entropy for mixed data

Pattern Recognition
Sub-space clustering, inter-clustering results association & anomaly correlation for unsupervised network anomaly detection

Proceedings of the 7th International Conference on Network and Services Management
Pseudopolynomial algorithms for certain computationally hard vector subset and cluster analysis problems

Automation and Remote Control
Towards the unification of structural and statistical pattern recognition

Pattern Recognition Letters
DBCAMM: A novel density based clustering algorithm via using the Mahalanobis metric

Applied Soft Computing
Classification of surgical processes using dynamic time warping

Journal of Biomedical Informatics
Unsupervised Network Intrusion Detection Systems: Detecting the Unknown without Knowledge

Computer Communications
Intrinsic dimension induced similarity measure for clustering

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
A Biologically Inspired Validity Measure for Comparison of Clustering Methods over Metabolic Data Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Optimal and hierarchical clustering of large-scale hybrid networks for scientific mapping

Scientometrics
Unsupervised and reliable image matting based on modified spectral matting

Journal of Visual Communication and Image Representation
An architecture for component-based design of representative-based clustering algorithms

Data & Knowledge Engineering
Partitive clustering (K-means family)

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Reduct and variance based clustering of high dimensional dataset

ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
Multidirectional knowledge extraction process for creating behavioral personas

Proceedings of the 10th Brazilian Symposium on on Human Factors in Computing Systems and the 5th Latin American Conference on Human-Computer Interaction
The pachycondyla apicalis ants search strategy for data clustering problems

SIDE'12 Proceedings of the 2012 international conference on Swarm and Evolutionary Computation
Objective function-based clustering

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
An Efficient Particle Filter---based Tracking Method Using Graphics Processing Unit (GPU)

Journal of Signal Processing Systems
Case-based modeling and the SACS Toolkit: a mathematical outline

Computational & Mathematical Organization Theory
iVisClustering: An Interactive Visual Document Clustering via Topic Modeling

Computer Graphics Forum
In search of optimal centroids on data clustering using a binary search algorithm

Pattern Recognition Letters
Semi-supervised clustering with discriminative random fields

Pattern Recognition
Generation of a clustering ensemble based on a gravitational self-organising map

Neurocomputing
A new learning structure heuristic of bayesian networks from data

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Constrained co-clustering with non-negative matrix factorisation

International Journal of Business Intelligence and Data Mining
Brain storm optimization algorithm for multi-objective optimization problems

ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part I
Evolving fuzzy classifier based on the modified ECM algorithm for pattern classification

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
A comparative study of efficient initialization methods for the k-means clustering algorithm

Expert Systems with Applications: An International Journal
PhaseQuant: A tool for quantifying tomographic data sets of geological specimens

Computers & Geosciences
Efficient jaccard-based diversity analysis of large document collections

Proceedings of the 21st ACM international conference on Information and knowledge management
Continuous rotation invariant local descriptors for texton dictionary-based texture classification

Computer Vision and Image Understanding
Black hole: A new heuristic optimization approach for data clustering

Information Sciences: an International Journal
Spectral clustering based on k-nearest neighbor graph

CISIM'12 Proceedings of the 11th IFIP TC 8 international conference on Computer Information Systems and Industrial Management
On the use of consensus clustering for incremental learning of topic hierarchies

SBIA'12 Proceedings of the 21st Brazilian conference on Advances in Artificial Intelligence
Weighting features for partition around medoids using the minkowski metric

IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis
Clustering interval data through kernel-induced feature space

Journal of Intelligent Information Systems
Cross product line analysis

Proceedings of the Seventh International Workshop on Variability Modelling of Software-intensive Systems
Achieving scalable model-based testing through test case diversity

ACM Transactions on Software Engineering and Methodology (TOSEM)
Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks

Journal of Parallel and Distributed Computing
PHA: A fast potential-based hierarchical agglomerative clustering method

Pattern Recognition
A classification model based on incomplete information on features in the form of their average values

Scientific and Technical Information Processing
Soft clustering -- Fuzzy and rough approaches and their extensions and derivatives

International Journal of Approximate Reasoning
Handwritten Data Clustering Using Agents Competition in Networks

Journal of Mathematical Imaging and Vision
Random walk distances in data clustering and applications

Advances in Data Analysis and Classification
A new approach for manufacturing forecast problems with insufficient data: the case of TFT---LCDs

Journal of Intelligent Manufacturing
"Seismic-mass" density-based algorithm for spatio-temporal clustering

Expert Systems with Applications: An International Journal
Information-theoretic clustering: A representative and evolutionary approach

Expert Systems with Applications: An International Journal
Ranking and selection of unsupervised learning marketing segmentation

Knowledge-Based Systems
Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number

Pattern Recognition
Integrating cluster formation and cluster evaluation in interactive visual analysis

Proceedings of the 27th Spring Conference on Computer Graphics
2013 Special Issue: Self-organizing adaptive map: Autonomous learning of curves and surfaces from point samples

Neural Networks
Automatic virtual machine clustering based on bhattacharyya distance for multi-cloud systems

Proceedings of the 2013 international workshop on Multi-cloud applications and federated clouds
Cartogram visualization for nonlinear manifold learning models

Data Mining and Knowledge Discovery
Comparing relational and non-relational algorithms for clustering propositional data

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Network protocol identification ensemble with EA optimization

Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
An empirical evaluation of different initializations on the number of k-means iterations

MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Improvements to the quantum evolutionary clustering

International Journal of Data Analysis Techniques and Strategies
Warped K-Means: An algorithm to cluster sequentially-distributed data

Information Sciences: an International Journal
On the combination of relative clustering validity criteria

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Better cross company defect prediction

Proceedings of the 10th Working Conference on Mining Software Repositories
Fuzzy and hard clustering analysis for thyroid disease

Computer Methods and Programs in Biomedicine
Dense subgraph mining with a mixed graph model

Pattern Recognition Letters
Assessing group cohesion in homophily networks

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Dominant color segmentation of administrative document images by hierarchical clustering

Proceedings of the 2013 ACM symposium on Document engineering
UbiHeld: ubiquitous healthcare monitoring system for elderly and chronic patients

Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication
Fuzzy partition based soft subspace clustering and its applications in high dimensional data

Information Sciences: an International Journal
Discovering metric temporal constraint networks on temporal databases

Artificial Intelligence in Medicine
Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Clustering based on a near neighbor graph and a grid cell graph

Journal of Intelligent Information Systems
Spatial pattern recognition of seismic events in South West Colombia

Computers & Geosciences
CoBi: Pattern Based Co-Regulated Biclustering of Gene Expression Data

Pattern Recognition Letters
Discriminant Convex Non-negative Matrix Factorization for the classification of human brain tumours

Pattern Recognition Letters
Data stream clustering: A survey

ACM Computing Surveys (CSUR)
Similar or not similar: this is a parameter question

HCI International'13 Proceedings of the 15th international conference on Human Interface and the Management of Information: information and interaction design - Volume Part I
Effects-based feature identification for network intrusion detection

Neurocomputing
Multi-site study of surgical practice in neurosurgery based on surgical process models

Journal of Biomedical Informatics
Unsupervised classification and visualization of unstructured text for the support of interdisciplinary collaboration

Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
On Knowledge-Enhanced Document Clustering

International Journal of Information Retrieval Research
Optimising sum-of-squares measures for clustering multisets defined over a metric space

Discrete Applied Mathematics
QUAC: Quick unsupervised anisotropic clustering

Pattern Recognition
Detecting hidden enemy lines in IP address space

Proceedings of the 2013 workshop on New security paradigms workshop
Representative cross information potential clustering

Pattern Recognition Letters
An efficient and scalable family of algorithms for combining clusterings

Engineering Applications of Artificial Intelligence
A probabilistic approach to latent cluster analysis

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Energy-based function to evaluate data stream clustering

Advances in Data Analysis and Classification
Deflation-based power iteration clustering

Applied Intelligence
An extension to Rough c-means clustering based on decision-theoretic Rough Sets model

International Journal of Approximate Reasoning
Input variable selection in time-critical knowledge integration applications: A review, analysis, and recommendation paper

Advanced Engineering Informatics
Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms

Expert Systems with Applications: An International Journal
MRI breast cancer diagnosis hybrid approach using adaptive ant-based segmentation and multilayer perceptron neural networks classifier

Applied Soft Computing
Evolving soft subspace clustering

Applied Soft Computing
Active selection of clustering constraints: a sequential approach

Pattern Recognition
A method based on shape-similarity for detecting similar opinions in group decision-making

Information Sciences: an International Journal
A novel cache size optimization scheme based on manifold learning in Content Centric Networking

Journal of Network and Computer Applications
Data summarization for network traffic monitoring

Journal of Network and Computer Applications
Pattern classification and clustering: A review of partially supervised learning approaches

Pattern Recognition Letters
Semi-supervised clustering of large data sets with kernel methods

Pattern Recognition Letters
Context-sensitive intra-class clustering

Pattern Recognition Letters
A weighted multivariate Fuzzy C-Means method in interval-valued scientific production data

Expert Systems with Applications: An International Journal
Leveraging clustering approaches to solve the gray-sheep users problem in recommender systems

Expert Systems with Applications: An International Journal
Dynamic clustering of histogram data based on adaptive squared Wasserstein distances

Expert Systems with Applications: An International Journal
Weighted ensemble of algorithms for complex data clustering

Pattern Recognition Letters
Fusion of finger types for fingerprint indexing using minutiae quadruplets

Pattern Recognition Letters
Relative entropy fuzzy c-means clustering

Information Sciences: an International Journal
Enhancing scalability and accuracy of recommendation systems using unsupervised learning and particle swarm optimization

Applied Soft Computing
Social linkage and ranking model for tags-based resources

International Journal of Metadata, Semantics and Ontologies
Analysis of the k-means algorithm in the case of data points occurring on the border of two or more clusters

Knowledge-Based Systems
Hierarchical Social Network Analysis Using a Multi-Agent System: A School System Case

International Journal of Agent Technologies and Systems
Multi-Objective Optimization Based on Brain Storm Optimization Algorithm

International Journal of Swarm Intelligence Research
Survey of Clustering: Algorithms and Applications

International Journal of Information Retrieval Research
A genetic algorithm-based clustering and two-scan labelling for colour image segmentation

International Journal of Computational Vision and Robotics
Dynamic exploration designs for graphical models using clustering with applications to petroleum exploration

Knowledge-Based Systems
Evaluation of hyperbox neural network learning for classification

Neurocomputing
A ranking-based algorithm for detection of outliers in categorical data

International Journal of Hybrid Intelligent Systems
A framework to monitor clusters evolution applied to economy and finance problems

Intelligent Data Analysis
Data stream dynamic clustering supported by Markov chain isomorphisms

Intelligent Data Analysis
Mutual information evaluation: A way to predict the performance of feature weighting on clustering

Intelligent Data Analysis

Quantified Score

Hi-index	0.11

Visualization

Abstract

Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into a system of ranked taxa: domain, kingdom, phylum, class, etc. Cluster analysis is the formal study of methods and algorithms for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is to find structure in data and is therefore exploratory in nature. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty in designing a general purpose clustering algorithm and the ill-posed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection during data clustering, and large scale data clustering.