Representation and learning in information retrieval
Representation and learning in information retrieval
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A classifier for semi-structured documents
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Combining Textual and Visual Cues for Content-Based Image Retrieval on the World Wide Web
CBAIVL '98 Proceedings of the IEEE Workshop on Content - Based Access of Image and Video Libraries
Classification of HTML Documents by Hidden Tree-Markov Models
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Word sense disambiguation with pictures
HLT-NAACL-LWM '04 Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data - Volume 6
MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
Building systems to block pornography
IM'99 Proceedings of the 1999 international conference on Challenge of Image Retrieval
From Searching to Browsing through Multimodal Documents Linking
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Semantic representation of multimedia content: Knowledge representation and semantic indexing
Multimedia Tools and Applications
WebAngels Filter: A Violent Web Filtering Engine Using Textual and Structural Content-Based Analysis
ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
Web news categorization using a cross-media document graph
Proceedings of the ACM International Conference on Image and Video Retrieval
A flexible structured-based representation for XML document mining
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Named entity recognition for web content filtering
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Modified naïve bayes classifier for e-catalog classification
DEECS'06 Proceedings of the Second international conference on Data Engineering Issues in E-Commerce and Services
Hi-index | 0.00 |
We propose a new statistical model for the classification of structured documents and consider its use for multimedia document classification. Its main originality is its ability to simultaneously take into account the structural and the content information present in a structured document, and also to cope with different types of content (text, image, etc). We present experiments on the classification of multilingual pornographic HTML pages using text and image data. The system accurately classifies porn sites from 8 European languages. This corpus has been developed by EADS company in the context of a large Web site filtering application.