Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Foundations of statistical natural language processing
Foundations of statistical natural language processing
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A brief survey of web data extraction tools
ACM SIGMOD Record
Structural extraction from visual layout of documents
Proceedings of the eleventh international conference on Information and knowledge management
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Information Extraction from Document Images Using White Space and Graphics Analysis
SSPR '98/SPR '98 Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Combining Textual and Visual Cues for Content-Based Image Retrieval on the World Wide Web
CBAIVL '98 Proceedings of the IEEE Workshop on Content - Based Access of Image and Video Libraries
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Structured multimedia document classification
Proceedings of the 2003 ACM symposium on Document engineering
Bayesian network model for semi-structured document classification
Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Semantics reinforcement and fusion learning for multimedia streams
Proceedings of the 6th ACM international conference on Image and video retrieval
Information-theoretic semantic multimedia indexing
Proceedings of the 6th ACM international conference on Image and video retrieval
Web-based information content and its application to concept-based video retrieval
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Semantics and CBIR: a medical imaging perspective
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Hi-index | 0.00 |
In this paper we propose a multimedia categorization framework that is able to exploit information across different parts of a multimedia document (e.g., a Web page, a PDF, a Microsoft Office document). For example, a Web news page is composed by text describing some event (e.g., a car accident) and a picture containing additional information regarding the real extent of the event (e.g., how damaged the car is) or providing evidence corroborating the text part. The framework handles multimedia information by considering not only the document's text and images data but also the layout structure which determines how a given text block is related to a particular image. The novelties and contributions of the proposed framework are: (1) support of heterogeneous types of multimedia documents; (2) a document-graph representation method; and (3) the computation of cross-media correlations. Moreover, we applied the framework to the tasks of categorising Web news feed data, and our results show a significant improvement over a single-medium based framework.