An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
Robust information extraction from automatically generated speech transcriptions
Speech Communication - Special issue on accessing information in spoken audio
Modeling uncertainty for information extraction from speech data
Modeling uncertainty for information extraction from speech data
Performance evaluation for text processing of noisy inputs
Proceedings of the 2005 ACM symposium on Applied computing
Summarization of noisy documents: a pilot study
HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Robust named entity extraction from large spoken archives
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Optical character recognition errors and their effects on natural language processing
Proceedings of the second workshop on Analytics for noisy unstructured text data
A survey of types of text noise and techniques to handle noisy text
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Hi-index | 0.00 |
In this paper we describe a technique for improving the performance of an information extraction system for speech data by explicitly modeling the errors in the recognizer output. The approach combines a statistical model of named entity states with a lattice representation of hypothesized words and errors annotated with recognition confidence scores. Additional refinements include the use of multiple error types, improved confidence estimation, and multipass processing. In combination, these techniques improve named entity recognition performance over a text-based baseline by 28%.