A Clustering Framework Based on Adaptive Space Mapping and Rescaling

Authors:
Yiling Zeng;Hongbo Xu;Jiafeng Guo;Yu Wang;Shuo Bai
Affiliations:
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 100080;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 100080;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 100080;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 100080;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 100080 and Shanghai Stock Exchange, Shanghai, China 200120
Venue:
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Year:
2009

Citing 13
Cited 0

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Information Retrieval

Information Retrieval
Document clustering with cluster refinement and model selection capabilities

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A refinement approach to handling model misfit in text categorization

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning

Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
A novel refinement approach for text categorization

Proceedings of the 14th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional clustering algorithms often suffer from model misfit problem when the distribution of real data does not fit the model assumptions. To address this problem, we propose a novel clustering framework based on adaptive space mapping and rescaling, referred as M-R framework. The basic idea of our approach is to adjust the data representation to make the data distribution fit the model assumptions better. Specifically, documents are first mapped into a low dimensional space with respect to the cluster centers so that the distribution statistics of each cluster could be analyzed on the corresponding dimension. With the statistics obtained in hand, a rescaling operation is then applied to regularize the data distribution based on the model assumptions. These two steps are conducted iteratively along with the clustering algorithm to constantly improve the clustering performance. In our work, we apply the M-R framework on the most widely used clustering algorithm, i.e. k-means, as an example. Experiments on well known datasets show that our M-R framework can obtain comparable performance with state-of-the-art methods.