Indexing low frequency information for question answering

Authors:
Abolfazl Keighobadi Lamjiri;Julien Dubuc;Leila Kosseim;Sabine Bergler
Affiliations:
Concordia University, Montreal, Québec, Canada;Concordia University, Montreal, Québec, Canada;Concordia University, Montreal, Québec, Canada;Concordia University, Montreal, Québec, Canada
Venue:
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Year:
2007

Citing 7
Cited 1

Scaling question answering to the Web

Proceedings of the 10th international conference on World Wide Web
Question answering from the web using knowledge annotation and knowledge mining techniques

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Principle-based parsing without overgeneration

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Information retrieval for question answering a SIGIR 2004 workshop

ACM SIGIR Forum
The role of information retrieval in answering complex questions

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
WordNet::Similarity: measuring the relatedness of concepts

HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
A re-examination of IR techniques in QA system

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Passage retrieval in log files: an approach based on query enrichment

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents our experiments with a low-frequency approach to information retrieval for question answering over a small, closed domain corpus and a variety of question types. With a corpus of 255 questions categorized into simple, average and challenging, we compared the performance of our question answering system (QASCU) when used with two different information retrieval systems, Lucene and BioKI. Lucene uses a standard tf.idf weighting scheme on documents, while BioKI uses a weighted keyword occurrence optimization scheme on paragraphs, that does not bias against low-frequency terms. While IR with Lucene yields better IR results at the document level than BioKI, running QASCU on BioKI output achieves higher precision. This indicates that for closed domain QA with an IR component, the basic F-measure performance of the IR component at the document level is not necessarily indicative of the overall performance. We contend that the findings are relevant also to retrieval from video, text, and sound collections that usually feature low redundancy in the text snippets used for retrieval.