A new method for eliciting three speaking styles in the laboratory
Speech Communication
Combining lexical, syntactic and prosodic cues for improved online dialog act tagging
Computer Speech and Language
Combining multiple information layers for the automatic generation of indicative meeting abstracts
ENLG '07 Proceedings of the Eleventh European Workshop on Natural Language Generation
Using integer linear programming for detecting speech disfluencies
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
IEEE Transactions on Audio, Speech, and Language Processing
Formatting time-aligned ASR transcripts for readability
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Automatic comma insertion for Japanese text generation
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
ACM Transactions on Asian Language Information Processing (TALIP)
Cross-domain speech disfluency detection
SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Panning for EBMT gold, or "Remembering not to forget"
Machine Translation
Question detection in spoken conversations using textual conversations
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Detection of agreement and disagreement in broadcast conversations
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Joint reranking of parsing and word recognition with automatic segmentation
Computer Speech and Language
Design, creation, and analysis of Czech corpora for structural metadata extraction from speech
Language Resources and Evaluation
Disfluencies and the perspective of prosodic fluency
COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony
Revisiting centrality-as-relevance: support sets and similarity as geometric proximity
Journal of Artificial Intelligence Research
A monotonic statistical machine translation approach to speaking style transformation
Computer Speech and Language
Summarizing speech by contextual reinforcement of important passages
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Spoken Content Retrieval: A Survey of Techniques and Technologies
Foundations and Trends in Information Retrieval
A readability evaluation of real-time crowd captions in the classroom
Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility
Automatic assessment of expressive oral reading
Speech Communication
International Journal of Mobile Human Computer Interaction
Hi-index | 0.00 |
Effective human and automatic processing of speech requires recovery of more than just the words. It also involves recovering phenomena such as sentence boundaries, filler words, and disfluencies, referred to as structural metadata. We describe a metadata detection system that combines information from different types of textual knowledge sources with information from a prosodic classifier. We investigate maximum entropy and conditional random field models, as well as the predominant hidden Markov model (HMM) approach, and find that discriminative models generally outperform generative models. We report system performance on both broadcast news and conversational telephone speech tasks, illustrating significant performance differences across tasks and as a function of recognizer performance. The results represent the state of the art, as assessed in the NIST RT-04F evaluation