Validation of overlapping clustering: A random clustering perspective

Authors:
Junjie Wu;Hua Yuan;Hui Xiong;Guoqing Chen
Affiliations:
School of Economics and Management, Beihang University, Beijing 100191, China;School of Management and Economics, University of Electronic Science and Technology of China, Chengdu 610054, China;Management Science and Information Systems Department, Rutgers University, Newark 07102, NJ, USA;School of Economics and Management, Tsinghua University, Beijing 100084, China
Venue:
Information Sciences: an International Journal
Year:
2010

Citing 29
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
WebACE: a Web agent for document categorization and exploration

AGENTS '98 Proceedings of the second international conference on Autonomous agents
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Information Retrieval

Information Retrieval
Cluster validity methods: part I

ACM SIGMOD Record
Clustering validity checking methods: part II

ACM SIGMOD Record
The effectiveness of query-specific hierarchic clustering in information retrieval

Information Processing and Management: an International Journal
Frequent term-based text clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Strong Affinity Association Patterns in Data Sets with Skewed Support Distribution

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Clustering and Information Retrieval (Network Theory and Applications)

Clustering and Information Retrieval (Network Theory and Applications)
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Comparing clusterings: an axiomatic view

ICML '05 Proceedings of the 22nd international conference on Machine learning
Hyperclique pattern discovery

Data Mining and Knowledge Discovery
Introduction to Probability Models, Ninth Edition

Introduction to Probability Models, Ninth Edition
A novel document similarity measure based on earth mover's distance

Information Sciences: an International Journal
Clustering high dimensional data: A graph-based relaxed optimization approach

Information Sciences: an International Journal
External validation measures for K-means clustering: A data distribution perspective

Expert Systems with Applications: An International Journal
Exploiting noun phrases and semantic relationships for text document clustering

Information Sciences: an International Journal
Towards understanding hierarchical clustering: A data distribution perspective

Neurocomputing
Adapting the right measures for K-means clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Performance evaluation of density-based clustering methods

Information Sciences: an International Journal
Towards supporting expert evaluation of clustering results using a data mining process model

Information Sciences: an International Journal
K-means clustering versus validation measures: a data-distribution perspective

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

A clustering algorithm for multiple data streams based on spectral component similarity

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

As a widely used clustering validation measure, the F-measure has received increased attention in the field of information retrieval. In this paper, we reveal that the F-measure can lead to biased views as to results of overlapped clusters when it is used for validating the data with different cluster numbers (incremental effect) or different prior probabilities of relevant documents (prior-probability effect). We propose a new ''IMplication Intensity'' (IMI) measure which is based on the F-measure and is developed from a random clustering perspective. In addition, we carefully investigate the properties of IMI. Finally, experimental results on real-world data sets show that IMI significantly alleviates biased incremental and prior-probability effects which are inherent to the F-measure.