Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Co-clustering documents and words using bipartite spectral graph partitioning
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Modern Information Retrieval
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Strong Affinity Association Patterns in Data Sets with Skewed Support Distribution
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A fast kernel-based multilevel algorithm for graph clustering
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Relevance search and anomaly detection in bipartite graphs
ACM SIGKDD Explorations Newsletter
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining and Knowledge Discovery
Proposing a new term weighting scheme for text categorization
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Hi-index | 0.00 |
To alleviate the problem of high dimensions in text clustering, an alternative to conventional methods is bipartite partitioning, where terms and documents are modeled as vertices on two sides respectively. Term weighting schemes, which assign weights to the edges linking terms and documents, are vital for the final clustering performance. In this paper, we conducted an comprehensive evaluation of six variants of tf/idf factor as term weighting schemes in bipartite partitioning. With various external validation measures, we found tfidf most effective in our experiments. Besides, our experimental results also indicated that df factor generally leads to better performance than tf factor at moderate partitioning size.