Locally adaptive metrics for clustering high dimensional data

Authors:
Carlotta Domeniconi;Dimitrios Gunopulos;Sheng Ma;Bojun Yan;Muna Al-Razgan;Dimitris Papadopoulos
Affiliations:
George Mason University, Fairfax, USA;UC Riverside, Riverside, USA;Vivido Media Inc., Beijing, China 100085;George Mason University, Fairfax, USA;George Mason University, Fairfax, USA;UC Riverside, Riverside, USA
Venue:
Data Mining and Knowledge Discovery
Year:
2007

Citing 23
Cited 34

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Local learning algorithms

Neural Computation
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Clustering and singular value decomposition for approximate indexing in high dimensional spaces

Proceedings of the seventh international conference on Information and knowledge management
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mixtures of probabilistic principal component analyzers

Neural Computation
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Feature Subset Selection and Order Identification for Unsupervised Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Feature Weighting in k-Means Clustering

Machine Learning
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Solving cluster ensemble problems by bipartite graph partitioning

ICML '04 Proceedings of the twenty-first international conference on Machine learning

Comparison between two coevolutionary feature weighting algorithms in clustering

Pattern Recognition
A heuristic-based fuzzy co-clustering algorithm for categorization of high-dimensional data

Fuzzy Sets and Systems
Exploitation of a parallel clustering algorithm on commodity hardware with P2P-MPI

The Journal of Supercomputing
Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm

Computational Statistics & Data Analysis
Clustering with an N-dimensional extension of Gielis superformula

AIKED'08 Proceedings of the 7th WSEAS International Conference on Artificial intelligence, knowledge engineering and data bases
Weighted cluster ensembles: Methods and analysis

ACM Transactions on Knowledge Discovery from Data (TKDD)
Clustering with Feature Order Preferences

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Enhanced soft subspace clustering integrating within-cluster and between-cluster information

Pattern Recognition
Learning multiple nonredundant clusterings

ACM Transactions on Knowledge Discovery from Data (TKDD)
Clustering with feature order preferences

Intelligent Data Analysis - Artificial Intelligence
Empirical comparison of techniques for automated failure diagnosis

SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
Document clustering using synthetic cluster prototypes

Data & Knowledge Engineering
An entropy weighting mixture model for subspace clustering of high-dimensional data

Pattern Recognition Letters
Visual content representation using semantically similar visual words

Expert Systems with Applications: An International Journal
Class-dependent projection based method for text categorization

Pattern Recognition Letters
Advancing data clustering via projective clustering ensembles

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A novel attribute weighting algorithm for clustering high-dimensional categorical data

Pattern Recognition
A feature group weighting method for subspace clustering of high-dimensional data

Pattern Recognition
EEW-SC: Enhanced Entropy-Weighting Subspace Clustering for high dimensional gene expression data clustering analysis

Applied Soft Computing
A new clustering algorithm with the convergence proof

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part I
Feature interaction in subspace clustering using the Choquet integral

Pattern Recognition
Partitive clustering (K-means family)

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Subspace clustering

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Clustering high dimensional data

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A novel template matching approach to speaker-independent arabic spoken digit recognition

AIS'12 Proceedings of the Third international conference on Autonomous and Intelligent Systems
Automated feature weighting in naive bayes for high-dimensional data classification

Proceedings of the 21st ACM international conference on Information and knowledge management
A New Locally Weighted K-Means for Cancer-Aided Microarray Data Analysis

Journal of Medical Systems
Projective clustering ensembles

Data Mining and Knowledge Discovery
Hyper Media News: a fully automated platform for large scale analysis, production and distribution of multimodal news content

Multimedia Tools and Applications
Novel soft subspace clustering with multi-objective evolutionary approach for high-dimensional data

Pattern Recognition
A clustering ensemble framework based on elite selection of weighted clusters

Advances in Data Analysis and Classification
Class dependent feature weighting and k-nearest neighbor classification

PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
Subspace clustering of high-dimensional data: a predictive approach

Data Mining and Knowledge Discovery
QuMinS: Fast and scalable querying, mining and summarizing multi-modal databases

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering suffers from the curse of dimensionality, and similarity functions that use all input features with equal relevance may not be effective. We introduce an algorithm that discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of features. This approach avoids the risk of loss of information encountered in global dimensionality reduction techniques, and does not assume any data distribution model. Our method associates to each cluster a weight vector, whose values capture the relevance of features within the corresponding cluster. We experimentally demonstrate the gain in perfomance our method achieves with respect to competitive methods, using both synthetic and real datasets. In particular, our results show the feasibility of the proposed technique to perform simultaneous clustering of genes and conditions in gene expression data, and clustering of very high-dimensional data such as text data.