On term selection for query expansion
Journal of Documentation
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Deriving concept hierarchies from text
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
Findex: search result categories help users when document ranking fails
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
On a combination of probabilistic and boolean ir models for WWW document retrieval
ACM Transactions on Asian Language Information Processing (TALIP)
Clustering versus faceted categories for information exploration
Communications of the ACM - Supporting exploratory search
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
A new measure for query disambiguation using term co-occurrences
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Hi-index | 0.00 |
We propose a method for supporting query refinement using topical term clusters. First, we propose a new term weighting method that can extract terms strongly related to a specific topic, because a document set retrieved with an ambiguous query may include divergent topics. Our formulation of term weighting is based on the statistics of term co-occurrence. Then, we generate term clusters using extracted terms, and rerank the documents in the search results by using each term cluster as a query. This clustering procedure is intended to isolate each topic as a set of related terms. In our experiments, we evaluated our term weighting method by checking: 1) whether each of the top-ranked document sets corresponds to one topic; and 2) whether some of the top-ranked document sets cover all the topics included in the synthesized document set. The results of our experiment show our method outperforms the existing term weighting methods MI, KLD, CHI-square and RSV.