Word distribution analysis for relevance ranking and query expansion

Authors:
Patricio Galeas;Bernd Freisleben
Affiliations:
Dept. of Mathematics and Computer Science, University of Marburg, Marburg, Germany;Dept. of Mathematics and Computer Science, University of Marburg, Marburg, Germany
Venue:
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Year:
2008

Citing 13
Cited 2

The effect of adding relevance information in a relevance feedback environment

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Clumping properties of content-bearing words

Journal of the American Society for Information Science
Improving the effectiveness of information retrieval with local context analysis

ACM Transactions on Information Systems (TOIS)
Placing search in context: the concept revisited

ACM Transactions on Information Systems (TOIS)
Improving pseudo-relevance feedback in web information retrieval using web page segmentation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Exploiting query history for document ranking in interactive information retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Using Linear Algebra for Intelligent Information Retrieval

Using Linear Algebra for Intelligent Information Retrieval
Query expansion using associated queries

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Distribution of content words and phrases in text and language modelling

Natural Language Engineering
Tuning before feedback: combining ranking discovery and blind feedback for robust retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Block-based web search

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Mining dependency relations for query expansion in passage retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Searching with Document Space Adapted Ontologies

WSKS '08 Proceedings of the 1st world summit on The Knowledge Society: Emerging Technologies and Information Systems for the Knowledge Society
Towards a Possibilistic Information Retrieval System Using Semantic Query Expansion

International Journal of Intelligent Information Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Apart from the frequency of terms in a document collection, the distribution of words plays an important role in determining the relevance of documents for a given search query. In this paper, word distribution analysis as a novel approach for using descriptive statistics to calculate a compressed representation of word positions in a document corpus is introduced. Based on this statistical approximation, two methods for improving the evaluation of document relevance are proposed: (a) a relevance ranking procedure based on how query terms are distributed over initially retrieved documents, and (b) a query expansion technique based on overlapping the distributions of terms in the top-ranked documents. Experimental results obtained for the TREC-8 document collection demonstrate that the proposed approach leads to an improvement of about 6.6% over the term frequency/inverse document frequency weighting scheme without applying query reformulation or relevance feedback techniques.