Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news

Authors:
F. Batista;D. Caseiro;N. Mamede;I. Trancoso
Affiliations:
L2F, Spoken Language Systems Laboratory, INESC ID Lisboa R. Alves Redol, 9, 1000-029 Lisboa, Portugal and ISCTE, Instituto de Ciências do Trabalho e da Empresa, Portugal;L2F, Spoken Language Systems Laboratory, INESC ID Lisboa R. Alves Redol, 9, 1000-029 Lisboa, Portugal and IST, Instituto Superior Técnico, Technical University of Lisbon, Portugal;L2F, Spoken Language Systems Laboratory, INESC ID Lisboa R. Alves Redol, 9, 1000-029 Lisboa, Portugal and IST, Instituto Superior Técnico, Technical University of Lisbon, Portugal;L2F, Spoken Language Systems Laboratory, INESC ID Lisboa R. Alves Redol, 9, 1000-029 Lisboa, Portugal and IST, Instituto Superior Técnico, Technical University of Lisbon, Portugal
Venue:
Speech Communication
Year:
2008

Citing 9
Cited 6

A maximum entropy approach to natural language processing

Computational Linguistics
Prosody-based automatic segmentation of speech into sentences and topics

Speech Communication - Special issue on accessing information in spoken audio
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Periods, capitalized words, etc.

Computational Linguistics
Decision lists for lexical ambiguity resolution: application to accent restoration in Spanish and French

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A knowledge-free method for capitalized word disambiguation

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
tRuEcasIng

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Capitalizing machine translation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Enriching speech recognition with automatic detection of sentence boundaries and disfluencies

IEEE Transactions on Audio, Speech, and Language Processing

A Linguistically Inspired Statistical Model for Chinese Punctuation Generation

ACM Transactions on Asian Language Information Processing (TALIP)
Analysis of interrogatives in different domains

Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
Phoneme and tonal accent recognition for Thai speech

Expert Systems with Applications: An International Journal
Sibyl, a factoid question-answering system for spoken documents

ACM Transactions on Information Systems (TOIS)
Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval
ASR-based exercises for listening comprehension practice in European Portuguese

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

The following material presents a study about recovering punctuation marks, and capitalization information from European Portuguese broadcast news speech transcriptions. Different approaches were tested for capitalization, both generative and discriminative, using: finite state transducers automatically built from language models; and maximum entropy models. Several resources were used, including lexica, written newspaper corpora and speech transcriptions. Finite state transducers produced the best results for written newspaper corpora, but the maximum entropy approach also proved to be a good choice, suitable for the capitalization of speech transcriptions, and allowing straightforward on-the-fly capitalization. Evaluation results are presented both for written newspaper corpora and for broadcast news speech transcriptions. The frequency of each punctuation mark in BN speech transcriptions was analyzed for three different languages: English, Spanish and Portuguese. The punctuation task was performed using a maximum entropy modeling approach, which combines different types of information both lexical and acoustic. The contribution of each feature was analyzed individually and separated results for each focus condition are given, making it possible to analyze the performance differences between planned and spontaneous speech. All results were evaluated on speech transcriptions of a Portuguese broadcast news corpus. The benefits of enriching speech recognition with punctuation and capitalization are shown in an example, illustrating the effects of described experiments into spoken texts.