Speech Processing for Audio Indexing

Authors:
Lori Lamel;Jean-Luc Gauvain
Affiliations:
LIMSI-CNRS, Orsay Cedex, France 91403;LIMSI-CNRS, Orsay Cedex, France 91403
Venue:
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Year:
2008

Citing 10
Cited 6

Speech recognition by machines and humans

Speech Communication
Pronunciation variants across system configuration, language and speaking style

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Modeling pronunciation variation for ASR: a survey of the literature

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
The LIMSI Broadcast News transcription system

Speech Communication - Special issue on automatic transcription of broadcast news data
Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Multilingual Speech Processing

Multilingual Speech Processing
Continuous space language models

Computer Speech and Language
Speaker diarization: from broadcast news to lectures

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Multistage speaker diarization of broadcast news

IEEE Transactions on Audio, Speech, and Language Processing
Recent innovations in speech-to-text transcription at SRI-ICSI-UW

IEEE Transactions on Audio, Speech, and Language Processing

Automatic tagging and geotagging in video collections and communities

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Cross-modal categorisation of user-generated video sequences

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Intent and its discontents: the user at the wheel of the online video search engine

Proceedings of the 20th ACM international conference on Multimedia
Fisher kernel based relevance feedback for multimodal video retrieval

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Multimedia information seeking through search and hyperlinking

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Blip10000: a social video dataset containing SPUG content for tagging and retrieval

Proceedings of the 4th ACM Multimedia Systems Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses some of the recent trends in speech processing, with a focus on speech-to-text transcription as a means to facilitate access to multimedia information in a multilingual context. A brief overview of automatic speech recognition is given along with indicative performance measures for a range of tasks. Enriched transcriptions, that is enhancing the automatic word transcripts with meta-data derived from the audio data is discussed, followed by some hightlights of recent progress and remaining challenges in speech recognition.