Semantic keyword extraction via adaptive text binarization of unstructured unsourced video

Authors:
Michele Merler;John R. Kender
Affiliations:
Department of Computer Science, Columbia University;Department of Computer Science, Columbia University
Venue:
ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Year:
2009

Citing 5
Cited 4

Comparing presentation summaries: slides vs. reading vs. listening

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Integral Histogram: A Fast Way To Extract Histograms in Cartesian Spaces

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
An automated end-to-end lecture capture and broadcasting system

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Structuring low-quality videotaped lectures for cross-reference browsing by video text analysis

Pattern Recognition
Localizing and segmenting text in images and videos

IEEE Transactions on Circuits and Systems for Video Technology

Analysis, indexing and visualization of presentation videos

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Development and evaluation of indexed captioned searchable videos for STEM coursework

Proceedings of the 43rd ACM technical symposium on Computer Science Education
PEDIVHANDI: multimodal indexation and retrieval system for lecture videos

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
Bag of subjects: lecture videos multimodal indexing

Proceedings of the 2013 ACM symposium on Document engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a fully automatic method for summarizing and indexing unstructured presentation videos based on text extracted from the projected slides. We use changes of text in the slides as a means to segment the video into semantic shots. Unlike precedent approaches, our method does not depend on availability of the electronic source of the slides, but rather extracts and recognizes the text directly from the video. Once text regions are detected within keyframes, a novel binarization algorithm, Local Adaptive Otsu (LOA), is employed to deal with the low quality of video scene text, before feeding the regions to the open source Tesseract OCR engine for recognition. We tested our system on a corpus of 8 presentation videos for a total of 1 hour and 45 minutes, achieving 0.5343 Precision and 0.7446 Recall Character recognition rates, and 0.4947 Precision and 0.6651 Recall Word recognition rates. Besides being used for multimedia documents, topic indexing, and cross referencing, our system can be integrated into summarization and presentation tools such as the VAST MultiMedia Browser [1].