Evaluating text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
SVM Classification Using Sequences of Phonemes and Syllables
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Proceedings of the international workshop on Workshop on multimedia information retrieval
Using information gain to improve multi-modal information retrieval systems
Information Processing and Management: an International Journal
Query expansion with a medical ontology to improve a multimodal information retrieval system
Computers in Biology and Medicine
Annotation of heterogeneous multimedia content using automatic speech recognition
SAMT'07 Proceedings of the semantic and digital media technologies 2nd international conference on Semantic Multimedia
Overview of the ImageCLEFphoto 2008 photographic retrieval task
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Overview of the ImageCLEFmed 2008 medical image retrieval task
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Using an information retrieval system for video classification
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Using web sources for improving video categorization
Journal of Intelligent Information Systems
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
International Conference on Multimedia Retrieval
Hi-index | 12.05 |
This paper proposes the use of Internet as a rich source of information in order to generate learning corpora for video transcripts categorization systems. Our main goal in this work has been to study the behavior of different learning corpora generated from the Internet and analyze some of their features. Specifically, Wikipedia, Google and the blogosphere have been employed to generate these learning corpora, using the VideoCLEF 2008 track as the evaluation framework for the different experiments carried out. Based on this evaluation framework, we conclude that the proposed approach is a promising strategy for the video classification task using the transcripts of the videos. The different sizes of the corpora generated could lead to believe that better results are achieved when the corpus size is larger, but we demonstrate that this feature may not always be a reliable indicator of the behavior of the learning corpus. The obtained results show that the integration of knowledge from the blogosphere or Google allows generating more reliable corpora for this task than those based on Wikipedia.