Feature Weighting in k-Means Clustering

Authors:
Dharmendra S. Modha;W. Scott Spangler
Affiliations:
IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, USA. dmodha@almaden.ibm.com;IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, USA. spangles@almaden.ibm.com
Venue:
Machine Learning
Year:
2003

Citing 22
Cited 54

Global convergence and empirical consistency of the generalized Lloyd algorithm

IEEE Transactions on Information Theory
Numerical recipes in C: the art of scientific computing

Numerical recipes in C: the art of scientific computing
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A Review and Empirical Evaluation of Feature Weighting Methods for aClass of Lazy Learning Algorithms

Artificial Intelligence Review - Special issue on lazy learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Clustering hypertext with applications to web searching

HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Clustering Algorithms

Clustering Algorithms
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A Microeconomic View of Data Mining

Data Mining and Knowledge Discovery
Concept Decompositions for Large Sparse Text Data Using Clustering

Machine Learning
Query by Image and Video Content: The QBIC System

Computer
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Efficient Feature Selection in Conceptual Clustering

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Feature Selection as a Preprocessing Step for Hierarchical Clustering

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Model Selection in Unsupervised Learning with Applications To Document Clustering

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
User-chosen phrases in interactive query formulation for information retrieval

IRSG'98 Proceedings of the 20th Annual BCS-IRSG conference on Information Retrieval Research

A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
Learner: a system for acquiring commonsense knowledge by analogy

Proceedings of the 2nd international conference on Knowledge capture
Simultaneous Feature Selection and Clustering Using Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Document clustering based on cluster validation

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Automated Variable Weighting in k-Means Type Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Clustering with Bregman Divergences

The Journal of Machine Learning Research
Learning word senses with feature selection and order identification capabilities

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Using cluster validation criterion to identify optimal feature subset and cluster number for document clustering

Information Processing and Management: an International Journal
Locally adaptive metrics for clustering high dimensional data

Data Mining and Knowledge Discovery
Understanding complex IT environments using information analytics and visualization

Proceedings of the 2007 symposium on Computer human interaction for the management of information technology
A Unified Continuous Optimization Framework for Center-Based Clustering Methods

The Journal of Machine Learning Research
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

IEEE Transactions on Knowledge and Data Engineering
A k-mean clustering algorithm for mixed numeric and categorical data

Data & Knowledge Engineering
Comparison between two coevolutionary feature weighting algorithms in clustering

Pattern Recognition
Localized feature selection for clustering

Pattern Recognition Letters
A heuristic algorithm for clustering rooted ordered trees

Intelligent Data Analysis
Bootstrapping approach to feature-weight selection in fuzzy c-means algorithms with an application in color image segmentation

Pattern Recognition Letters
Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm

Computational Statistics & Data Analysis
Fuzzy Q-Learning with the modified fuzzy ART neural network

Web Intelligence and Agent Systems
State space segmentation for acquisition of agent behavior

Web Intelligence and Agent Systems
Identification of association rules between clusters

CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Clustering with Feature Order Preferences

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Unsupervised feature weighting with multi niche crowding genetic algorithms

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Deterministic Pivoting Algorithms for Constrained Ranking and Clustering Problems

Mathematics of Operations Research
COBRA - mining web for COrporate Brand and Reputation Analysis

Web Intelligence and Agent Systems
Clusterer ensemble

Knowledge-Based Systems
SKM-SNP: SNP markers detection method

Journal of Biomedical Informatics
Business insights workbench: an interactive insights discovery solution

Proceedings of the 2007 conference on Human interface: Part II
Regularized data fusion improves image segmentation

Proceedings of the 29th DAGM conference on Pattern recognition
Ensemble learning based distributed clustering

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Clustering with feature order preferences

Intelligent Data Analysis - Artificial Intelligence
Document clustering using synthetic cluster prototypes

Data & Knowledge Engineering
Adapt the mRMR criterion for unsupervised feature selection

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional

Expert Systems with Applications: An International Journal
Integrating hierarchical feature selection and classifier training for multi-label image annotation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Sample-weighted clustering methods

Computers & Mathematics with Applications
A feature group weighting method for subspace clustering of high-dimensional data

Pattern Recognition
Eigenvector sensitive feature selection for spectral clustering

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering

Pattern Recognition
A novel fuzzy c-means clustering algorithm

RSKT'06 Proceedings of the First international conference on Rough Sets and Knowledge Technology
Data clustering: a user’s dilemma

PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
An optimized k-means algorithm of reducing cluster intra-dissimilarity for document clustering

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Feature interaction in subspace clustering using the Choquet integral

Pattern Recognition
Clustering by integrating multi-objective optimization with weighted k-means and validity analysis

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Runtime estimation using the case-based reasoning approach for scheduling in a grid environment

ICCBR'10 Proceedings of the 18th international conference on Case-Based Reasoning Research and Development
An unsupervised feature selection framework based on clustering

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Attribute value weighting in k-modes clustering

Expert Systems with Applications: An International Journal
Improvement of k-means clustering using patents metadata

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Weighting features for partition around medoids using the minkowski metric

IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis
A weighting k-modes algorithm for subspace clustering of categorical data

Neurocomputing
Fuzzy partition based soft subspace clustering and its applications in high dimensional data

Information Sciences: an International Journal
Local-to-global semi-supervised feature selection

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
The clustering model and algorithm of PPI network based on propagating mechanism of artificial bee colony

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data sets with multiple, heterogeneous feature spaces occur frequently. We present an abstract framework for integrating multiple feature spaces in the k-means clustering algorithm. Our main ideas are (i) to represent each data object as a tuple of multiple feature vectors, (ii) to assign a suitable (and possibly different) distortion measure to each feature space, (iii) to combine distortions on different feature spaces, in a convex fashion, by assigning (possibly) different relative weights to each, (iv) for a fixed weighting, to cluster using the proposed convex k-means algorithm, and (v) to determine the optimal feature weighting to be the one that yields the clustering that simultaneously minimizes the average within-cluster dispersion and maximizes the average between-cluster dispersion along all the feature spaces. Using precision/recall evaluations and known ground truth classifications, we empirically demonstrate the effectiveness of feature weighting in clustering on several different application domains.