Positional language models for information retrieval

Authors:
Yuanhua Lv;ChengXiang Zhai
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 25
Cited 46

The use of term position devices in ranked output experiments

Journal of Documentation
Some aspects of proximity searching in text retrieval systems

Journal of Information Science
Approaches to passage retrieval in full text information systems

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Passage retrieval revisited

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Effective document presentation with a locality-based similarity heuristic

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval

Proceedings of the eighth international conference on Information and knowledge management
Efficient passage ranking for document databases

ACM Transactions on Information Systems (TOIS)
Effective ranking with arbitrary passages

Journal of the American Society for Information Science and Technology
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Passage retrieval based on language models

Proceedings of the eleventh international conference on Information and knowledge management
Quantitative evaluation of passage retrieval algorithms for question answering

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Term proximity scoring for ad-hoc retrieval on very large text collections

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
An exploration of proximity measures in information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Proximity-based document representation for named entity retrieval

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
An analysis on document length retrieval trends in language modeling smoothing

Information Retrieval
Statistical Language Models for Information Retrieval A Critical Review

Foundations and Trends in Information Retrieval
Term proximity scoring for keyword-based retrieval systems

ECIR'03 Proceedings of the 25th European conference on IR research
Viewing term proximity from a different perspective

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

Granular Computing for Text Mining: New Research Challenges and Opportunities

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Freshness matters: in flowers, food, and web authority

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
How good is a span of terms?: exploiting proximity to improve web retrieval

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Mining the blogosphere for top news stories identification

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Proximity-based opinion retrieval

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Positional relevance model for pseudo-relevance feedback

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Improved latent concept expansion using hierarchical markov random fields

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Using various term dependencies according to their utilities

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Towards an optimal weighting of context words based on distance

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
An analysis of learned proximity functions

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Modeling term proximity for probabilistic information retrieval models

Information Sciences: an International Journal
TEMPER: a temporal relevance feedback method

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
CRTER: using cross terms to enhance probabilistic information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A boosting approach to improving pseudo-relevance feedback

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Time-based relevance models

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A New Language Model Combining Single and Compound Terms

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
High-performance processing of text queries with tunable pruned term and term pair indexes

ACM Transactions on Information Systems (TOIS)
Effective query formulation with multiple information sources

Proceedings of the fifth ACM international conference on Web search and data mining
Mining anchor text trends for retrieval

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Timeline generation through evolutionary trans-temporal summarization

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Extraction of the contents in the web texts by content-density distribution

International Journal of Knowledge Engineering and Soft Data Paradigms
Improving news ranking by community tweets

Proceedings of the 21st international conference companion on World Wide Web
Extraction of web texts using content-density distribution

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
A new generative opinion retrieval model integrating multiple ranking factors

Journal of Intelligent Information Systems
Matching meaning for cross-language information retrieval

Information Processing and Management: an International Journal
Proximity-based rocchio's model for pseudo relevance

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Extending BM25 with multiple query operators

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Modeling higher-order term dependencies in information retrieval using query hypergraphs

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Learning-Based pseudo-relevance feedback for patent retrieval

IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Aggregation Methods for Proximity-Based Opinion Retrieval

ACM Transactions on Information Systems (TOIS)
Non-syntactic word prediction for AAC

SLPAT '12 Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies
Visualizing timelines: evolutionary summarization via iterative reinforcement between text and image streams

Proceedings of the 21st ACM international conference on Information and knowledge management
A picture paints a thousand words: a method of generating image-text timelines

Proceedings of the 21st ACM international conference on Information and knowledge management
Position-Aligned translation model for citation recommendation

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
A personal perspective on keyword search over data graphs

Proceedings of the 16th International Conference on Database Theory
Recognition of word collocation habits using frequency rank ratio and inter-term intimacy

Expert Systems with Applications: An International Journal
Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Modeling term dependencies with quantum language models for IR

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Reading contexts for structured documents retrieval

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Structured positional entity language model for enterprise entity retrieval

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Exploiting proximity feature in statistical translation models for information retrieval

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Position-based contextualization for passage retrieval

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Enhancement of passage scorers by proximity-based term occurrence weighting

International Journal of Intelligent Information and Database Systems
Optimizing ranking method using social annotations based on language model

Artificial Intelligence Review
Latent word context model for information retrieval

Information Retrieval
Semantic concept-enriched dependence model for medical information retrieval

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although many variants of language models have been proposed for information retrieval, there are two related retrieval heuristics remaining "external" to the language modeling approach: (1) proximity heuristic which rewards a document where the matched query terms occur close to each other; (2) passage retrieval which scores a document mainly based on the best matching passage. Existing studies have only attempted to use a standard language model as a "black box" to implement these heuristics, making it hard to optimize the combination parameters. In this paper, we propose a novel positional language model (PLM) which implements both heuristics in a unified language model. The key idea is to define a language model for each position of a document, and score a document based on the scores of its PLMs. The PLM is estimated based on propagated counts of words within a document through a proximity-based density function, which both captures proximity heuristics and achieves an effect of "soft" passage retrieval. We propose and study several representative density functions and several different PLM-based document ranking strategies. Experiment results on standard TREC test collections show that the PLM is effective for passage retrieval and performs better than a state-of-the-art proximity-based retrieval model.