Robust named entity extraction from large spoken archives

Authors:
Benoît Favre;Frédéric Béchet;Pascal Nocéra
Affiliations:
MMP Laboratory, Colombes, France;University of Avignon, Avignon, France;University of Avignon, Avignon, France
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 6
Cited 7

An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Topic detection and tracking evaluation overview

Topic detection and tracking
Named entity extraction from noisy input: speech and OCR

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Improving information extraction by modeling errors in speech recognizer output

HLT '01 Proceedings of the first international conference on Human language technology research
Generalized algorithms for constructing statistical language models

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Using N-best lists for named entity recognition from Chinese speech

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers

Incorporating speech recognition confidence into discriminative named entity recognition of speech data

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Effects of word confusion networks on voice search

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Reshaping automatic speech transcripts for robust high-level spoken document analysis

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Leveraging word confusion networks for named entity modeling and detection from conversational telephone speech

Speech Communication
Sibyl, a factoid question-answering system for spoken documents

ACM Transactions on Information Systems (TOIS)
Coupling knowledge-based and data-driven systems for named entity recognition

HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
Speech for Content Creation

International Journal of Mobile Human Computer Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional approaches to Information Extraction (IE) from speech input simply consist in applying text based methods to the output of an Automatic Speech Recognition (ASR) system. If it gives satisfaction with low Word Error Rate (WER) transcripts, we believe that a tighter integration of the IE and ASR modules can increase the IE performance in more difficult conditions. More specifically this paper focuses on the robust extraction of Named Entities from speech input where a temporal mismatch between training and test corpora occurs. We describe a Named Entity Recognition (NER) system, developed within the French Rich Broadcast News Transcription program ESTER, which is specifically optimized to process ASR transcripts and can be integrated into the search process of the ASR modules. Finally we show how some metadata information can be collected in order to adapt NER and ASR models to new conditions and how they can be used in a task of Named Entity indexation of spoken archives.