An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
Robust information extraction from automatically generated speech transcriptions
Speech Communication - Special issue on accessing information in spoken audio
High performance question/answering
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A question answering system supported by information extraction
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Named entity extraction from noisy input: speech and OCR
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Teaching a weaker classifier: named entity recognition on upper case text
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Extracting exact answers to questions based on structural links
MultiSumQA '02 proceedings of the 2002 conference on multilingual summarization and question answering - Volume 19
InfoXtract: a customizable intermediate level information extraction engine
SEALTS '03 Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems - Volume 8
Hindi-english cross-lingual question-answering system
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
Most question answering (QA) systems rely on both keyword index and Named Entity (NE) tagging. The corpus from which the QA systems attempt to retrieve answers is usually mixed case text. However, there are numerous corpora that consist of case insensitive documents, e.g. speech recognition results. This paper presents a successful approach to QA on a case insensitive corpus, whereby a preprocessing module is designed to restore the case-sensitive form. The document pool with the restored case then feeds the QA system, which remains unchanged. The case restoration preprocessing is implemented as a Hidden Markov Model trained on a large raw corpus of case sensitive documents. It is demonstrated that this approach leads to very limited degradation in QA benchmarking (2.8%), mainly due to the limited degradation in the underlying information extraction support.