Cross-domain matching for automatic tag extraction across redundant handwriting and speech events

Authors:
Edward C. Kaiser
Affiliations:
Adapx, Seattle, WA
Venue:
Proceedings of the 2007 workshop on Tagging, mining and retrieval of human related activity information
Year:
2007

Citing 21
Cited 0

Introduction to artificial intelligence

Introduction to artificial intelligence
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Retrieving spoken documents by combining multiple index sources

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Advances in phonetic word spotting

Proceedings of the tenth international conference on Information and knowledge management
Cross-Domain Approximate String Matching

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Acoustic Indexing for Multimedia Retrieval and Browsing

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Automatic time alignment of phonemes using acoustic-phonetic information

Automatic time alignment of phonemes using acoustic-phonetic information
A new algorithm for the alignment of phonetic sequences

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A study of digital ink in lecture presentation

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Speech, ink, and slides: the interaction of content channels

Proceedings of the 12th annual ACM international conference on Multimedia
Multimodal new vocabulary recognition through speech and handwriting in a whiteboard scheduling application

Proceedings of the 10th international conference on Intelligent user interfaces
Shift continuous DP: A fast matching algorithm between arbitrary parts of two time-sequence data sets

Systems and Computers in Japan
Speech pen: predictive handwriting based on ambient multimodal recognition

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Collaborative multimodal photo annotation over digital paper

Proceedings of the 8th international conference on Multimodal interfaces
Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations

Proceedings of the 8th international conference on Multimodal interfaces
A dynamic Bayesian framework to model context and memory in edit distance learning: an application to pronunciation classification

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Towards spoken-document retrieval for the internet: lattice indexing for large-scale web-search architectures

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Multimodal redundancy across handwriting and speech during computer mediated human-human interactions

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Toward content-aware multimodal tagging of personal photo collections

Proceedings of the 9th international conference on Multimodal interfaces
Evaluation of several phonetic similarity algorithms on the task of cognate identification

LD '06 Proceedings of the Workshop on Linguistic Distances
Indexing and search methods for spoken documents

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many types of natural human-human interactions people communicate important information redundantly across multiple communication modes, like saying what they handwrite during a presentation or discussion. To detect and benefit from such redundancies a computational understanding system must align the recognition outputs from different perceptual modes like handwriting and speech. Since the recognition domains of each mode differ, researchers refer to tasks like this as cross-domain matching. We describe how SHACER (our Speech and HAndwriting reCognizER) currently implements cross-domain matching, and compare that to an existing, formally optimal algorithm for this task. Successful alignment and recognition of such multimodal redundancies can be leveraged for automatic tagging of social interactions. These automatically generated tags can benefit retrieval techniques for non-textual documents recorded during computationally perceived social interactions.