An overview of audio information retrieval
Multimedia Systems - Special issue on audio and multimedia
Indexing and retrieval of broadcast news
Speech Communication - Special issue on accessing information in spoken audio
Spoken document representations for probabilistic retrieval
Speech Communication - Special issue on accessing information in spoken audio
SCANMail: audio navigation in the voicemail domain
HLT '01 Proceedings of the first international conference on Human language technology research
Position specific posterior lattices for indexing speech
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
The MIT spoken lecture processing project
HLT-Demo '05 Proceedings of HLT/EMNLP on Interactive Demonstrations
Speech Ogle: indexing uncertainty for spoken document search
ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Soft indexing of speech content for search in spoken documents
Computer Speech and Language
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Style & topic language model adaptation using HMM-LDA
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
N-gram weighting: reducing training data mismatch in cross-domain language model estimation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning sub-word units for open vocabulary speech recognition
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
An automated analysis and indexing framework for lecture video portal
ICWL'12 Proceedings of the 11th international conference on Advances in Web-Based Learning
Hi-index | 0.00 |
In this paper we report on our recent efforts to collect a corpus of spoken lecture material that will enable research directed towards fast, accurate, and easy access to lecture content. Thus far, we have collected a corpus of 270 hours of speech from a variety of undergraduate courses and seminars. We report on an initial analysis of the spontaneous speech phenomena present in these data and the vocabulary usage patterns across three courses. Finally, we examine language model perplexities trained from written and spoken materials, and describe an initial recognition experiment on one course.