Question Answering on a case insensitive corpus

Authors:
Wei Li;Rohini Srihari;Cheng Niu;Xiaoge Li
Affiliations:
Cymfony Inc., Williamsville, NY;Cymfony Inc., Williamsville, NY;Cymfony Inc., Williamsville, NY;Cymfony Inc., Williamsville, NY
Venue:
MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Year:
2003

Citing 10
Cited 1

An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Robust information extraction from automatically generated speech transcriptions

Speech Communication - Special issue on accessing information in spoken audio
High performance question/answering

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A question answering system supported by information extraction

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Answer extraction

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Named entity extraction from noisy input: speech and OCR

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Teaching a weaker classifier: named entity recognition on upper case text

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Extracting exact answers to questions based on structural links

MultiSumQA '02 proceedings of the 2002 conference on multilingual summarization and question answering - Volume 19
InfoXtract: a customizable intermediate level information extraction engine

SEALTS '03 Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems - Volume 8

Hindi-english cross-lingual question-answering system

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most question answering (QA) systems rely on both keyword index and Named Entity (NE) tagging. The corpus from which the QA systems attempt to retrieve answers is usually mixed case text. However, there are numerous corpora that consist of case insensitive documents, e.g. speech recognition results. This paper presents a successful approach to QA on a case insensitive corpus, whereby a preprocessing module is designed to restore the case-sensitive form. The document pool with the restored case then feeds the QA system, which remains unchanged. The case restoration preprocessing is implemented as a Hidden Markov Model trained on a large raw corpus of case sensitive documents. It is demonstrated that this approach leads to very limited degradation in QA benchmarking (2.8%), mainly due to the limited degradation in the underlying information extraction support.