Dynamic captioning: video accessibility enhancement for hearing impairment

Authors:
Richang Hong;Meng Wang;Mengdi Xu;Shuicheng Yan;Tat-Seng Chua
Affiliations:
School of Computing, National University of Singapore, Singapore, Singapore;School of Computing, National University of Singapore, Singapore, Singapore;Dept. of ECE, National University of Singapore, Singapore, Singapore;Dept. of ECE, National University of Singapore, Singapore, Singapore;School of Computing, National University of Singapore, Singapore, Singapore
Venue:
Proceedings of the international conference on Multimedia
Year:
2010

Citing 17
Cited 20

Reasoning about naming systems

ACM Transactions on Programming Languages and Systems (TOPLAS)
Advances in human-computer interaction (vol. 5)

Advances in human-computer interaction (vol. 5)
QoS impact on user perception and understanding of multimedia video clips

MULTIMEDIA '98 Proceedings of the sixth ACM international conference on Multimedia
Tessa, a system to aid communication with deaf people

Proceedings of the fifth international ACM conference on Assistive technologies
Vocal communication of emotion: a review of research paradigms

Speech Communication - Special issue on speech and emotion
Contrast-based image attention analysis by using fuzzy growing

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Automatic Face Recognition for Film Character Retrieval in Feature-Length Films

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Real-Time Multiple Objects Tracking with Occlusion Handling in Dynamic Scenes

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Visual Speech Recognition with Loosely Synchronized Feature Streams

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Hierarchical movie affective content analysis based on arousal and valence features

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Robust Face Recognition via Sparse Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unfolding speaker clustering potential: a biomimetic approach

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Inferring semantic concepts from community-contributed images and noisy tags

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Accessible image search

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Unified video annotation via multigraph learning

IEEE Transactions on Circuits and Systems for Video Technology
Beyond distance measurement: constructing neighborhood similarity for video annotation

IEEE Transactions on Multimedia - Special section on communities and media computing
Joint covariate selection and joint subspace selection for multiple classification problems

Statistics and Computing

Video accessibility enhancement for hearing-impaired users

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) - Special section on ACM multimedia 2010 best paper candidates, and issue on social media
Beyond search: Event-driven summarization for web videos

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Videoader: a video advertising system based on intelligent analysis of visual content

Proceedings of the Third International Conference on Internet Multimedia Computing and Service
An online video recommendation framework using rich information

Proceedings of the Third International Conference on Internet Multimedia Computing and Service
On video recommendation over social network

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Enhancing learning accessibility through fully automatic captioning

Proceedings of the International Cross-Disciplinary Conference on Web Accessibility
Query difficulty estimation for image retrieval

Neurocomputing
A probabilistic graphical model for topic and preference discovery on social media

Neurocomputing
Relationship strength estimation for online social networks with the study on Facebook

Neurocomputing
Collaborative visual modeling for automatic image annotation via sparse model coding

Neurocomputing
Personalized video recommendation based on viewing history with the study on YouTube

Proceedings of the 4th International Conference on Internet Multimedia Computing and Service
Touch saliency

Proceedings of the 20th ACM international conference on Multimedia
Improving image tags by exploiting web search results

Multimedia Tools and Applications
Multimedia encyclopedia construction by mining web knowledge

Signal Processing
Social image tagging using graph-based reinforcement on multi-type interrelated objects

Signal Processing
An improved method of locality sensitive hashing for indexing large-scale and high-dimensional features

Signal Processing
Static saliency vs. dynamic saliency: a comparative study

Proceedings of the 21st ACM international conference on Multimedia
Advertising object in web videos

Neurocomputing
Top-Down Saliency Detection via Contextual Pooling

Journal of Signal Processing Systems
A novel framework for concept detection on large scale video database and feature pool

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are more than 66 million people su®ering from hearing impairment and this disability brings them di±culty in the video content understanding due to the loss of audio information. If scripts are available, captioning technology can help them in a certain degree by synchronously illustrating the scripts during the playing of videos. However, we show that the existing captioning techniques are far from satisfactory in assisting hearing impaired audience to enjoy videos. In this paper, we introduce a video accessibility enhancement scheme with a Dynamic Captioning approach, which explores a rich set of technologies including face detection and recognition, visual saliency analysis, text-speech alignment, etc. Different from the existing methods that are categorized as static captioning here, dynamic captioning puts scripts at suitable positions to help hearing impaired audience better recognize the speaking characters. In addition, it progressively highlights the scripts word-by-word via aligning them with the speech signal and illustrates the variation of voice volume. In this way, the special audience can better track the scripts and perceive the moods that are conveyed by the variation of volume. We implement the technology on 20 video clips and conduct an in-depth study with 60 real hearing impaired users, and results have demonstrated the effectiveness and usefulness of the video accessibility enhancement scheme.