Latent word context model for information retrieval

Authors:
Bernard Brosseau-Villeneuve;Jian-Yun Nie;Noriko Kando
Affiliations:
University of Montréal, Montreal, Canada H3C 3J7;University of Montréal, Montreal, Canada H3C 3J7;National Institute of Informatics, Tokyo, Japan 101-8430
Venue:
Information Retrieval
Year:
2014

Citing 38
Cited 0

Lexical ambiguity and information retrieval

ACM Transactions on Information Systems (TOIS)
Approaches to passage retrieval in full text information systems

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Using WordNet to disambiguate word senses for text retrieval

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Word sense disambiguation and information retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval

Proceedings of the eighth international conference on Information and knowledge management
The impact on retrieval effectiveness of skewed frequency distributions

ACM Transactions on Information Systems (TOIS)
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic query expansion using query logs

Proceedings of the 11th international conference on World Wide Web
Biterm language models for document retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieving with Good Sense

Information Retrieval
Word sense disambiguation in information retrieval revisited

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Introduction to the special issue on word sense disambiguation: the state of the art

Computational Linguistics - Special issue on word sense disambiguation
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Dependence language model for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval using word senses: root sense tagging approach

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Combining the language model and inference network approaches to retrieval

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
One sense per discourse

HLT '91 Proceedings of the workshop on Speech and Natural Language
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using term relationships in language models for information retrieval

Proceedings of the 14th ACM international conference on Information and knowledge management
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Differentiating homonymy and polysemy in information retrieval

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Selecting good expansion terms for pseudo-relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Accounting for burstiness in topic models

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A proximity language model for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Positional language models for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
NUS-ML: improving word sense disambiguation using topic features

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
SemEval-2010 task: Japanese WSD

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The application of word sense disambiguation (WSD) techniques to information retrieval (IR) has yet to provide convincing retrieval results. Major obstacles to effective WSD in IR include coverage and granularity problems of word sense inventories, sparsity of document context, and limited information provided by short queries. In this paper, to alleviate these issues, we propose the construction of latent context models for terms using latent Dirichlet allocation. We propose building one latent context per word, using a well principled representation of local context based on word features. In particular, context words are weighted using a decaying function according to their distance to the target word, which is learnt from data in an unsupervised manner. The resulting latent features are used to discriminate word contexts, so as to constrict query's semantic scope. Consistent and substantial improvements, including on difficult queries, are observed on TREC test collections, and the techniques combines well with blind relevance feedback. Compared to traditional topic modeling, WSD and positional indexing techniques, the proposed retrieval model is more effective and scales well on large-scale collections.