The quest for correct information on the Web: hyper search engines
Selected papers from the sixth international conference on World Wide Web
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Document expansion for speech retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Effective site finding using link anchor information
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting redundancy in question answering
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the MSE robustness of batching estimators
Proceedings of the 33nd conference on Winter simulation
The impact of corpus size on question answering performance
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Signal boosting for translingual topic tracking: document expansion and n-best translation
Topic detection and tracking
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
The role of lexico-semantic feedback in open-domain textual question-answering
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Integrating linguistic knowledge in passage retrieval for question answering
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Structured retrieval for question answering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The QuALiM question answering demo: supplementing answers with paragraphs drawn from Wikipedia
HLT-Demonstrations '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Demo Session
Multi-document summarization by sentence extraction
NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization
International Journal of Human-Computer Studies
Automatic generation of topic pages using query-based aspect models
Proceedings of the 18th ACM conference on Information and knowledge management
Textual resource acquisition and engineering
IBM Journal of Research and Development
Improving retrieval of short texts through document expansion
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Textual resource acquisition and engineering
IBM Journal of Research and Development
A phased ranking model for question answering
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Gem-based entity-knowledge maintenance
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Constructing query-specific knowledge bases
Proceedings of the 2013 workshop on Automated knowledge base construction
Hi-index | 0.00 |
A source expansion algorithm automatically extends a given text corpus with related content from large external sources such as the Web. The expanded corpus is not intended for human consumption but can be used in question answering (QA) and other information retrieval or extraction tasks to find more relevant information and supporting evidence. We propose an algorithm that extends a corpus of seed documents with web content, using a statistical model to select text passages that are both relevant to the topics of the seeds and complement existing information. In an evaluation on 1,500 hand-labeled web pages, our algorithm ranked text passages by relevance with 81% MAP, compared to 43% when relying on web search engine ranks alone and 75% when using a multi-document summarization algorithm. Applied to QA, the proposed method yields consistent and significant performance gains. We evaluated the impact of source expansion on over 6,000 questions from the Jeopardy! quiz show and TREC evaluations using Watson, a state-of-the-art QA system. Accuracy increased from 66% to 71% on Jeopardy! questions and from 59% to 64% on TREC questions.