Named entity extraction from noisy input: speech and OCR

Authors:
David Miller;Sean Boisen;Richard Schwartz;Rebecca Stone;Ralph Weischedel
Affiliations:
BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA
Venue:
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Year:
2000

Citing 3
Cited 18

An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Advances in the BBN BYBLOS OCR System

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing

Extracting Caller Information from Voicemail

Information Retrieval Techniques for Speech Applications [this book is based on the workshop “Information Retrieval Techniques for Speech Applications”, held as part of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in New Orleans, USA, in September 2001].
Performance evaluation for text processing of noisy inputs

Proceedings of the 2005 ACM symposium on Applied computing
Information extraction from voicemail

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Noisy Text Categorization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical named entity recognizer adaptation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Question Answering on a case insensitive corpus

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Summarization of noisy documents: a pilot study

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
Robust named entity extraction from large spoken archives

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Optical character recognition errors and their effects on natural language processing

Proceedings of the second workshop on Analytics for noisy unstructured text data
A survey of types of text noise and techniques to handle noisy text

Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Improving mention detection robustness to noisy input

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Extracting person names from diverse and noisy OCR text

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
The effect of noise in automatic text classification

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Name extraction and formal concept analysis

ICCS'11 Proceedings of the 19th international conference on Conceptual structures for discovering knowledge
Performing information extraction to improve OCR error detection in semi-structured historical documents

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
The effects of OCR error on the extraction of private information

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Empirical evaluation of semi-automated XML annotation of text documents with the GoldenGATE editor

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Structured named entities in two distinct press corpora: contemporary broadcast news and old newspapers

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we analyze the performance of name finding in the context of a variety of automatic speech recognition (ASR) systems and in the context of one optical character recognition (OCR) system. We explore the effects of word error rate from ASR and OCR, performance as a function of the amount of training data, and for speech, the effect of out-of-vocabulary errors and the loss of punctuation and mixed case