Term weighting evaluation in bipartite partitioning for text clustering

Authors:
Chao Qu;Yong Li;Jun Zhu;Peican Huang;Ruifen Yuan;Tianming Hu
Affiliations:
Dongguan University of Technology, China and Zhongshan University, China;Dongguan University of Technology, China;Dongguan University of Technology, China and Zhongshan University, China;Dongguan University of Technology, China;Dongguan University of Technology, China;Dongguan University of Technology, China and East China Normal University, China
Venue:
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Year:
2008

Citing 18
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Modern Information Retrieval

Modern Information Retrieval
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Strong Affinity Association Patterns in Data Sets with Skewed Support Distribution

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
A fast kernel-based multilevel algorithm for graph clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Relevance search and anomaly detection in bipartite graphs

ACM SIGKDD Explorations Newsletter
Integration of semantic-based bipartite graph representation and mutual refinement strategy for biomedical literature clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Hyperclique pattern discovery

Data Mining and Knowledge Discovery
Proposing a new term weighting scheme for text categorization

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

To alleviate the problem of high dimensions in text clustering, an alternative to conventional methods is bipartite partitioning, where terms and documents are modeled as vertices on two sides respectively. Term weighting schemes, which assign weights to the edges linking terms and documents, are vital for the final clustering performance. In this paper, we conducted an comprehensive evaluation of six variants of tf/idf factor as term weighting schemes in bipartite partitioning. With various external validation measures, we found tfidf most effective in our experiments. Besides, our experimental results also indicated that df factor generally leads to better performance than tf factor at moderate partitioning size.