Statistical source expansion for question answering

Authors:
Nico Schlaefer;Jennifer Chu-Carroll;Eric Nyberg;James Fan;Wlodek Zadrozny;David Ferrucci
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;IBM T.J. Watson Research Center, Hawthorne, NY, USA;Carnegie Mellon University, Pittsburgh, PA, USA;IBM T.J. Watson Research Center, Hawthorne, NY, USA;IBM T.J. Watson Research Center, Hawthorne, NY, USA;IBM T.J. Watson Research Center, Hawthorne, NY, USA
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 19
Cited 5

The quest for correct information on the Web: hyper search engines

Selected papers from the sixth international conference on World Wide Web
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
Document expansion for speech retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Effective site finding using link anchor information

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting redundancy in question answering

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the MSE robustness of batching estimators

Proceedings of the 33nd conference on Winter simulation
The impact of corpus size on question answering performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Signal boosting for translingual topic tracking: document expansion and n-best translation

Topic detection and tracking
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
The role of lexico-semantic feedback in open-domain textual question-answering

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Integrating linguistic knowledge in passage retrieval for question answering

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Structured retrieval for question answering

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The QuALiM question answering demo: supplementing answers with paragraphs drawn from Wikipedia

HLT-Demonstrations '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Demo Session
Multi-document summarization by sentence extraction

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization
Mining meaning from Wikipedia

International Journal of Human-Computer Studies
Automatic generation of topic pages using query-based aspect models

Proceedings of the 18th ACM conference on Information and knowledge management
Textual resource acquisition and engineering

IBM Journal of Research and Development

Improving retrieval of short texts through document expansion

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Textual resource acquisition and engineering

IBM Journal of Research and Development
A phased ranking model for question answering

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Gem-based entity-knowledge maintenance

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Constructing query-specific knowledge bases

Proceedings of the 2013 workshop on Automated knowledge base construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

A source expansion algorithm automatically extends a given text corpus with related content from large external sources such as the Web. The expanded corpus is not intended for human consumption but can be used in question answering (QA) and other information retrieval or extraction tasks to find more relevant information and supporting evidence. We propose an algorithm that extends a corpus of seed documents with web content, using a statistical model to select text passages that are both relevant to the topics of the seeds and complement existing information. In an evaluation on 1,500 hand-labeled web pages, our algorithm ranked text passages by relevance with 81% MAP, compared to 43% when relying on web search engine ranks alone and 75% when using a multi-document summarization algorithm. Applied to QA, the proposed method yields consistent and significant performance gains. We evaluated the impact of source expansion on over 6,000 questions from the Jeopardy! quiz show and TREC evaluations using Watson, a state-of-the-art QA system. Accuracy increased from 66% to 71% on Jeopardy! questions and from 59% to 64% on TREC questions.