Totalrecall: visualization and semi-automatic annotation of very large audio-visual corpora

Authors:
Rony Kubat;Philip DeCamp;Brandon Roy
Affiliations:
MIT Media Lab, Cambridge, MA;MIT Media Lab, Cambridge, MA;MIT Media Lab, Cambridge, MA
Venue:
Proceedings of the 9th international conference on Multimodal interfaces
Year:
2007

Citing 8
Cited 8

Generalized fisheye views

CHI '86 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Natural language understanding (2nd ed.)

Natural language understanding (2nd ed.)
Transcriber: Development and use of a tool for assisting speech corpora production

Speech Communication - Special issue on speech annotation and corpus tools
The annotation graph toolkit: software components for building linguistic annotation tools

HLT '01 Proceedings of the first international conference on Human language technology research
Video Visualization

Proceedings of the 14th IEEE Visualization 2003 (VIS'03)
Mining temporal patterns of movement for video content classification

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Sphinx-4: a flexible open source framework for speech recognition

Sphinx-4: a flexible open source framework for speech recognition

A Framework for Review, Annotation, and Classification of Continuous Video in Context

SG '09 Proceedings of the 10th International Symposium on Smart Graphics
Child selection of learning methods: a corpus based on real-world data

Proceedings of the 2nd Workshop on Child, Computer and Interaction
CorpVis: An Online Emotional Speech Corpora Visualisation Interface

SAMT '09 Proceedings of the 4th International Conference on Semantic and Digital Media Technologies: Semantic Multimedia
User perception of interruptions in multimedia annotation tasks

Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries
An immersive system for browsing and visualizing surveillance video

Proceedings of the international conference on Multimedia
Evaluating video visualizations of human behavior

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
VizKid: a behavior capture and visualization system of adult-child interaction

HCII'11 Proceedings of the 1st international conference on Human interface and the management of information: interacting with information - Volume Part II
A portable audio/video recorder for longitudinal study of child development

Proceedings of the 14th ACM international conference on Multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a system for visualizing, annotating, and analyzing very large collections of longitudinal audio and video recordings. The system, TotalRecall, is designed to address the requirements of projects like the Human Speechome Project, for which more than 100,000 hours of multitrack audio and video have been collected over a twentytwo month period. Our goal in this project is to transcribe speech in over 10,000 hours of audio recordings, and to annotate the position and head orientation of multiple people in the 10,000 hours of corresponding video. Higher level behavioral analysis of the corpus will be based on these and other annotations. To efficiently cope with this huge corpus, we are developing semi-automatic data coding methods that are integrated into TotalRecall. Ultimately, this system and the underlying methodology may enable new forms of multimodal behavioral analysis grounded in ultradense longitudinal data.