Speech recognition by machines and humans
Speech Communication
Pronunciation variants across system configuration, language and speaking style
Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Modeling pronunciation variation for ASR: a survey of the literature
Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
The LIMSI Broadcast News transcription system
Speech Communication - Special issue on automatic transcription of broadcast news data
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Multilingual Speech Processing
Multilingual Speech Processing
Continuous space language models
Computer Speech and Language
Speaker diarization: from broadcast news to lectures
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Multistage speaker diarization of broadcast news
IEEE Transactions on Audio, Speech, and Language Processing
Recent innovations in speech-to-text transcription at SRI-ICSI-UW
IEEE Transactions on Audio, Speech, and Language Processing
Automatic tagging and geotagging in video collections and communities
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Cross-modal categorisation of user-generated video sequences
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Intent and its discontents: the user at the wheel of the online video search engine
Proceedings of the 20th ACM international conference on Multimedia
Fisher kernel based relevance feedback for multimodal video retrieval
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Multimedia information seeking through search and hyperlinking
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Blip10000: a social video dataset containing SPUG content for tagging and retrieval
Proceedings of the 4th ACM Multimedia Systems Conference
Hi-index | 0.00 |
This paper addresses some of the recent trends in speech processing, with a focus on speech-to-text transcription as a means to facilitate access to multimedia information in a multilingual context. A brief overview of automatic speech recognition is given along with indicative performance measures for a range of tasks. Enriched transcriptions, that is enhancing the automatic word transcripts with meta-data derived from the audio data is discussed, followed by some hightlights of recent progress and remaining challenges in speech recognition.