Metadata-driven interactive web video assembly
Multimedia Tools and Applications
The Evolution of TV Systems, Content, and Users Toward Interactivity
Foundations and Trends in Human-Computer Interaction
A novel multimedia data mining framework for information extraction of a soccer video stream
Intelligent Data Analysis
A simple segmentation approach for sport scene images
WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
SVE: an integrated system for soccer video edition
Proceedings of the international conference on Multimedia information retrieval
Soccer video summarization using enhanced logo detection
ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
An evaluation method for video semantic models
MIS'05 Proceedings of the 11th international conference on Advances in Multimedia Information Systems
ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
Hi-index | 0.00 |
This thesis proposes solutions for structural and semantic video modeling, automatic video analysis, and expressive video search and retrieval. We present a structural-semantic video model for effective representation of high- and low-level video information, an automatic, multi-modal sports video processing framework for instantiation of the model attributes and summarization, and, finally, a graph-based query formation and resolution framework for semantic search and retrieval based on the proposed model. Except for the video analysis algorithms, which are specific to sports video, the proposed structural-semantic video model and the graph-based querying framework are generic in the sense that they are applicable to description and querying of any type of video. We first introduce a structural-semantic video model for efficient description of high-level and low-level video features. The proposed model unifies the shot-based and object-based structural video models that are employed by video processing and computer vision communities with the entity-relationship (ER) or object-oriented models that are used by the database and information retrieval communities. This unified approach improves over the existing MPEG-7 approach that uses two description schemes (DS) for the same task. In order to instantiate model descriptors and generate automatic and real-time summaries of video, we focus on the domain of sports video because the extraction of high-level model entities from low-level video features necessitates the specification of a domain. We propose a multi-modal and scalable sports video processing framework for model descriptor instantiation and fast summarization of broadcast sports video. The proposed framework is multi-modal because it employs visual, audio, and text features, and is scalable because the system may generate descriptors in real-time or offline based upon user preferences and requirements. It is also applicable to multiple types of sports. The scalability of the framework results from the classification of visual features into cinematic and object-based features and efficient processing of them. Because cinematic features are easier to compute, we extract cinematic features, such as shot-boundaries, shot-types, and slow-motion replays, before object-based analysis that involves object detection and tracking. Real-time descriptors and summaries are computed by using only cinematic visual features as well as some audio and text features. Because some cinematic and object-based algorithms use features extracted from field region and most sporting events take place on a field with one distinct dominant color, we develop a robust low-level dominant color region detection algorithm that automatically detects the color of the field and adapts to the variations due to the changes in imaging conditions. (Abstract shortened by UMI.)