Cache-based language model adaptation using visual attention for ASR in meeting scenarios

Authors:
Neil J. Cooke;Martin J. Russell
Affiliations:
University of Birmingham, Birmingham, United Kingdom;University of Birmingham, Birmingham, United Kingdom
Venue:
Proceedings of the 2009 international conference on Multimodal interfaces
Year:
2009

Citing 2
Cited 0

A Cache-Based Natural Language Model for Speech Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a typical group meeting involving discussion and collaboration, people look at one another, at shared information resources such as presentation material, and also at nothing in particular. In this work we investigate whether the knowledge of what a person is looking at may improve the performance of Automatic Speech Recognition (ASR). A framework for cache Language Model (LM) adaptation is proposed with the cache based on a person's Visual Attention (VA) sequence. The framework attempts to measure the appropriateness of adaptation from VA sequence characteristics. Evaluation on the AMI Meeting corpus data shows reduced LM perplexity. This work demonstrates the potential for cache-based LM adaptation using VA information in large vocabulary ASR deployed in meeting scenarios.