Reshaping automatic speech transcripts for robust high-level spoken document analysis

Authors:
Julien Fayolle;Fabienne Moreau;Christian Raymond;Guillaume Gravier
Affiliations:
INRIA - IRISA, Rennes, France;University Rennes 2 - IRISA, Rennes, France;INSA - IRISA, Rennes, France;CNRS - IRISA, Rennes, France
Venue:
AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Year:
2010

Citing 8
Cited 1

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Efficient support vector classifiers for named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Robust named entity extraction from large spoken archives

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Named Entity Recognition for Improving Retrieval and Translation of Chinese Documents

ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information
Improving machine translation quality with automatic named entity recognition

EAMT '03 Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools: Resources and Tools for Building MT
Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition

Computer Speech and Language
The impact of named entity normalization on information retrieval for question answering

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

Sibyl, a factoid question-answering system for spoken documents

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.01

Visualization

Abstract

High-level spoken document analysis is required in many applications seeking access to the semantic content of audio data, such as information retrieval, machine translation or automatic summarization. It is nevertheless a difficult task that is generally based on transcripts provided by an automatic speech recognition system. Unlike standard texts, transcripts belong to the category of highly noisy data because of word recognition errors that affect, in particular, very significant words such as named entities (e.g. person's names, locations, organizations). Transcripts also contain specificities of spoken language that make ineffective their processing by natural language processing tools designed for texts. To overcome these issues, this paper proposes a method to reshape automatic speech transcripts for robust high-level spoken document analysis. The method consists in conceiving a new word-level confidence measure that may efficiently ensure the reliability of transcribed words, focusing on words that are relevant for high-level spoken document analysis such as named entities. The approach consists in combining different features collected from various sources of knowledge thanks to a machine learning method based on conditional random fields. In addition to standard features (morphosyntactic, linguistic and phonetic), we introduce new semantic features based on the decisions of three robust named entity recognition systems to better estimate the reliability of named entities. Experiments, conducted on the French broadcast news corpus ESTER, demonstrate the added-value of the proposed word-level confidence measure for error detection and named entity recognition, with respect to the basic confidence measure provided by an automatic speech recognition system.