Constrained locally weighted clustering

Authors:
Hao Cheng;Kien A. Hua;Khanh Vu
Affiliations:
University of Central Florida, Orlando, FL;University of Central Florida, Orlando, FL;University of Central Florida, Orlando, FL
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 22
Cited 15

Algorithms for clustering data

Algorithms for clustering data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Alternatives to the k-means algorithm that find better clusterings

Proceedings of the eleventh international conference on Information and knowledge management
Improving Performance of Similarity-Based Clustering by Feature Weight Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based approach

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
K-means clustering via principal component analysis

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Locality preserving clustering for image database

Proceedings of the 12th annual ACM international conference on Multimedia
A non-linear dimensionality-reduction technique for fast similarity search in large databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
K-means clustering versus validation measures: a data distribution perspective

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A clustering framework based on subjective and objective validity criteria

ACM Transactions on Knowledge Discovery from Data (TKDD)
Leveraging user query log: toward improving image data clustering

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Identifying and generating easy sets of constraints for clustering

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Measuring constraint-set utility for partitional clustering algorithms

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
A fast divisive clustering algorithm using an improved discrete particle swarm optimizer

Pattern Recognition Letters
PHD: an efficient data clustering scheme using partition space technique for knowledge discovery in large databases

Applied Intelligence
A probabilistic majorclust variant for the clustering of near-homogeneous graphs

KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
Clustering complex data with group-dependent feature selection

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Interactive feature selection for document clustering

Proceedings of the 2011 ACM Symposium on Applied Computing
Clustering very large multi-dimensional datasets with MapReduce

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A feature group weighting method for subspace clustering of high-dimensional data

Pattern Recognition
Semi-supervised document clustering with dual supervision through seeding

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Enhancing semi-supervised document clustering with feature supervision

Proceedings of the 27th Annual ACM Symposium on Applied Computing
A unified framework for document clustering with dual supervision

ACM SIGAPP Applied Computing Review
Automated feature weighting in naive bayes for high-dimensional data classification

Proceedings of the 21st ACM international conference on Information and knowledge management
Clustering Based on Independent Component

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Probability-one homotopy maps for tracking constrained clustering solutions

Proceedings of the High Performance Computing Symposium
QuMinS: Fast and scalable querying, mining and summarizing multi-modal databases

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data clustering is a difficult problem due to the complex and heterogeneous natures of multidimensional data. To improve clustering accuracy, we propose a scheme to capture the local correlation structures: associate each cluster with an independent weighting vector and embed it in the subspace spanned by an adaptive combination of the dimensions. Our clustering algorithm takes advantage of the known pairwise instance-level constraints. The data points in the constraint set are divided into groups through inference; and each group is assigned to the feasible cluster which minimizes the sum of squared distances between all the points in the group and the corresponding centroid. Our theoretical analysis shows that the probability of points being assigned to the correct clusters is much higher by the new algorithm, compared to the conventional methods. This is confirmed by our experimental results, indicating that our design indeed produces clusters which are closer to the ground truth than clusters created by the current state-of-the-art algorithms.