Real-time online multimedia content processing: mobile video optical character recognition and speech synthesizer for the visual impaired

Authors:
Shi-Yong Neo;Hai-Kiat Goh;Wendy Yen-Ni Ng;Jun-Da Ong;Wilson Pang
Affiliations:
SOC, Singapore;SOC, Singapore;Ministry of Education, Singapore;Kai Square Ptd Ltd, Singapore;Kai Square Ptd Ltd, Singapore
Venue:
Proceedings of the 1st international convention on Rehabilitation engineering & assistive technology: in conjunction with 1st Tan Tock Seng Hospital Neurorehabilitation Meeting
Year:
2007

Citing 8
Cited 3

On the Recognition of Printed Characters of Any Font and Size

IEEE Transactions on Pattern Analysis and Machine Intelligence
Speech Coding and Synthesis

Speech Coding and Synthesis
A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Speech Communication
Drishti: An Integrated Navigation System for Visually Impaired and Disabled

ISWC '01 Proceedings of the 5th IEEE International Symposium on Wearable Computers
Recent advances in visual and infrared face recognition: a review

Computer Vision and Image Understanding
High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 03
Key issues for the design and development of mobile commerce services and applications

International Journal of Mobile Communications
Toward an all-IP-based UMTS system architecture

IEEE Network: The Magazine of Global Internetworking

Proactive Versus Multimodal Online Help: An Empirical Study

AH '08 Proceedings of the 5th international conference on Adaptive Hypermedia and Adaptive Web-Based Systems
Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based Decomposition

Journal of Signal Processing Systems
The accessibility toolkit

Proceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the common difficulties faced by the visually impaired is the inability to read and thus affecting their way of life. Existing portable reading devices (using character recognition and speech synthesis) have many limitations and poor in accuracy due to restrictive processing power. In this paper, we introduce our robust online multimedia content processing framework to alleviate the limitations of such portable devices. We leverage high transfer speed using existing wireless networks to send multimedia information captured from mobile devices to high-end processing servers and subsequently stream the desired output back to users. The resultant framework enables more complex processes as they are carried out on the servers and thus outperforms standard portable devices in terms of accuracy and functionalities. In addition, we describe a new approach to improve optical character recognition (OCR) results by using consecutive video frames for automatic character correction. Experiments using consecutive frames show an improvement in 25% accuracy over traditional OCR using a single image. The application is also trialed by several visually impaired personnel and the feedback obtained is encouraging.