Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A flexible model for retrieval of SGML documents
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The Hierarchical Hidden Markov Model: Analysis and Applications
Machine Learning
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Learning probabilistic models of the Web (poster session)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A classifier for semi-structured documents
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using the Fisher Kernel Method to Detect Remote Protein Homologies
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
TreeFinder: a First Step towards XML Data Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Text categorization by boosting automatically extracted concepts
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Classification of HTML Documents by Hidden Tree-Markov Models
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
XRules: an effective structural classifier for XML data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Exploiting structural information for semi-structured document categorization
Information Processing and Management: an International Journal
Hierarchical topic segmentation of websites
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic Model for Structured Document Mapping
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
A bottom-up approach for XML documents classification
IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Anomaly detection in the case of message oriented middleware
Proceedings of the 2008 workshop on Middleware security
Feature Matrix Extraction and Classification of XML Pages
Advanced Web and NetworkTechnologies, and Applications
Web news categorization using a cross-media document graph
Proceedings of the ACM International Conference on Image and Video Retrieval
Exploiting structural information for semi-structured document categorization
Information Processing and Management: an International Journal
Discovering missing values in semi-structured databases
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Collective classification for spam filtering
CISIS'11 Proceedings of the 4th international conference on Computational intelligence in security for information systems
Discovering patterns in traffic sensor data
Proceedings of the 2nd ACM SIGSPATIAL International Workshop on GeoStreaming
A web classification framework based on XSLT
APWeb'06 Proceedings of the 2006 international conference on Advanced Web and Network Technologies, and Applications
Identification of multi-word expressions by combining multiple linguistic information sources
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Modified naïve bayes classifier for e-catalog classification
DEECS'06 Proceedings of the Second international conference on Data Engineering Issues in E-Commerce and Services
Classification of XSLT-Generated web documents with support vector machines
KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents
Examining text categorization methods for incidents analysis
PAISI'12 Proceedings of the 2012 Pacific Asia conference on Intelligence and Security Informatics
Combining link and content-based information in a Bayesian inference model for entity search
Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
Hi-index | 0.00 |
Recently, a new community has started to emerge around the development of new information research methods for searching and analyzing semi-structured and XML like documents. The goal is to handle both content and structural information, and to deal with different types of information content (text, image, etc.). We consider here the task of structured document classification. We propose a generative model able to handle both structure and content which is based on Bayesian networks. We then show how to transform this generative model into a discriminant classifier using the method of Fisher kernel. The model is then extended for dealing with different types of content information (here text and images). The model was tested on three databases: the classical webKB corpus composed of HTML pages, the new INEX corpus which has become a reference in the field of ad-hoc retrieval for XML documents, and a multimedia corpus of Web pages.