Adaptive document clustering based on query-based similarity

Authors:
Seung-Hoon Na;In-Su Kang;Jong-Hyeok Lee
Affiliations:
Division of Electrical and Computer Engineering, POSTECH (Pohang University of Science and Technology), San 31, Hyojadong, Namgu, Pohang 790-784, Republic of Korea;Division of Electrical and Computer Engineering, POSTECH (Pohang University of Science and Technology), San 31, Hyojadong, Namgu, Pohang 790-784, Republic of Korea;Division of Electrical and Computer Engineering, POSTECH (Pohang University of Science and Technology), San 31, Hyojadong, Namgu, Pohang 790-784, Republic of Korea
Venue:
Information Processing and Management: an International Journal
Year:
2007

Citing 12
Cited 2

Adaptive document clustering

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Modern Information Retrieval

Modern Information Retrieval
Dependence language model for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Parsimonious language models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus structure, language models, and ad hoc information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A framework for selective query expansion

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Better than the real thing?: iterative pseudo-query processing using cluster-based language models

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Dynamic context for document search and recovery

ICCSA'13 Proceedings of the 13th international conference on Computational Science and Its Applications - Volume 1
Partial-update dimensionality reduction for accumulating co-occurrence events

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

In information retrieval, cluster-based retrieval is a well-known attempt in resolving the problem of term mismatch. Clustering requires similarity information between the documents, which is difficult to calculate at a feasible time. The adaptive document clustering scheme has been investigated by researchers to resolve this problem. However, its theoretical viewpoint has not been fully discovered. In this regard, we provide a conceptual viewpoint of the adaptive document clustering based on query-based similarities, by regarding the user's query as a concept. As a result, adaptive document clustering scheme can be viewed as an approximation of this similarity. Based on this idea, we derive three new query-based similarity measures in language modeling framework, and evaluate them in the context of cluster-based retrieval, comparing with K-means clustering and full document expansion. Evaluation result shows that retrievals based on query-based similarities significantly improve the baseline, while being comparable to other methods. This implies that the newly developed query-based similarities become feasible criterions for adaptive document clustering.