PEDIVHANDI: multimodal indexation and retrieval system for lecture videos

Authors:
Nhu Van Nguyen;Jean-Marc Ogier;Franck Charneau
Affiliations:
L3I, University of La Rochelle, La Rochelle, France;L3I, University of La Rochelle, La Rochelle, France;@ctice, University of La Rochelle, La Rochelle, France
Venue:
ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
Year:
2012

Citing 5
Cited 0

A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Semantic keyword extraction via adaptive text binarization of unstructured unsourced video

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
TalkMiner: a lecture webcast search engine

Proceedings of the international conference on Multimedia
German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings

ICIS '11 Proceedings of the 2011 10th IEEE/ACIS International Conference on Computer and Information Science
Automatic Lecture Video Indexing Using Video OCR Technology

ISM '11 Proceedings of the 2011 IEEE International Symposium on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since text in slides and teacher's speech complementarily represent lecture contents, lecture videos can be indexed and retrieved by using a fully automatic and complete system based on the multimodal analysis of speech and text. In this paper, we present the multimodal lecture content indexing approach used in the PEDIVHANDI project. We use the discretization of speech and changes of slide's texts to identify lecture slides in the video. We also propose a duplicate verification to remove nearly-duplicate slides. After using the Stroke Width Transfrom (SWT) text detector to obtain text regions, a standard OCR engine is used for text recognition. Finally, a context-based spell check is proposed to correct words recognized. Our system achieves the recognition precision 71% and 57% recall on a corpus of 6 presentation videos for a total duration of 8 hours.