A comparative study of word co-occurrence for term clustering in language model-based sentence retrieval

Authors:
Saeedeh Momtazi;Sanjeev Khudanpur;Dietrich Klakow
Affiliations:
Saarland University;Johns Hopkins University;Saarland University
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 4
Cited 2

Word association norms, mutual information, and lexicography

Computational Linguistics
Class-based n-gram models of natural language

Computational Linguistics
Introduction to Information Retrieval

Introduction to Information Retrieval
A word clustering approach for language model-based sentence retrieval in question answering systems

Proceedings of the 18th ACM conference on Information and knowledge management

Trained trigger language model for sentence retrieval in QA: bridging the vocabulary gap

Proceedings of the 20th ACM international conference on Information and knowledge management
Half-context language models

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sentence retrieval is a very important part of question answering systems. Term clustering, in turn, is an effective approach for improving sentence retrieval performance: the more similar the terms in each cluster, the better the performance of the retrieval system. A key step in obtaining appropriate word clusters is accurate estimation of pairwise word similarities, based on their tendency to co-occur in similar contexts. In this paper, we compare four different methods for estimating word co-occurrence frequencies from two different corpora. The results show that different, commonly-used contexts for defining word co-occurrence differ significantly in retrieval performance. Using an appropriate co-occurrence criterion and corpus is shown to improve the mean average precision of sentence retrieval form 36.8% to 42.1%.