Automatic query expansion using data manifold

Authors:
Lingpeng Yang;Donghong Ji;Yu Nie;Tingting He
Affiliations:
Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Huazhong Normal University, Wuhan, China
Venue:
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Year:
2006

Citing 7
Cited 0

Improving automatic query expansion

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Improving the effectiveness of information retrieval with local context analysis

ACM Transactions on Information Systems (TOIS)
An information-theoretic approach to automatic query expansion

ACM Transactions on Information Systems (TOIS)
Improving the retrieval effectiveness of very short queries

Information Processing and Management: an International Journal
PageRank without hyperlinks: structural re-ranking using links induced by language models

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Improving web search results using affinity graph

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese document re-ranking based on term distribution and maximal marginal relevance

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes an automatic query expansion method that combines document re-ranking and standard Rocchio’s relevance feedback. The document re-ranking method ranks the top retrieved documents based on the intrinsic manifold structure collectively revealed by a great amount of data. This is done by using a semi-supervised learning algorithm to integrate pseudo relevant documents with documents to be re-ranked. Given an initial ranked list of retrieved documents, the document re-ranking approach picks a set of documents from the top ones (including query itself) as pseudo relevant documents. In this way, the intrinsic relationship of all the retrieved documents to be re-ranked with the pseudo relevant documents (pseudo irrelevant documents are missing) can be determined via a semi-supervised learning algorithm. Finally, all the retrieved documents can be re-ranked according to above relationship. Evaluation on benchmark corpora show that the approach can achieve much better performance than standard Rocchio’s relevance feedback and performance better than other related approaches.