Probabilistic Methods for Structured Document Classification at INEX'07

Authors:
Luis M. Campos;Juan M. Fernández-Luna;Juan F. Huete;Alfonso E. Romero
Affiliations:
Departamento de Ciencias de la Computación e Inteligencia Artificial E.T.S.I. Informática y de Telecomunicación, Universidad de Granada, Granada, Spain 18071;Departamento de Ciencias de la Computación e Inteligencia Artificial E.T.S.I. Informática y de Telecomunicación, Universidad de Granada, Granada, Spain 18071;Departamento de Ciencias de la Computación e Inteligencia Artificial E.T.S.I. Informática y de Telecomunicación, Universidad de Granada, Granada, Spain 18071;Departamento de Ciencias de la Computación e Inteligencia Artificial E.T.S.I. Informática y de Telecomunicación, Universidad de Granada, Granada, Spain 18071
Venue:
Focused Access to XML Documents
Year:
2008

Citing 7
Cited 2

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Combining classifiers in text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Learning Bayesian Network Classifiers: Searching in a Space of Partially Directed Acyclic Graphs

Machine Learning
Exploiting structural information for semi-structured document categorization

Information Processing and Management: an International Journal

Link-based text classification using Bayesian networks

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
X-Class: Associative Classification of XML Documents by Structure

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper exposes the results of our participation in the Document Mining track at INEX'07. We have focused on the task of classification of XML documents. Our approach to deal with structured document representations uses classification methods for plain text, applied to flattened versions of the documents, where some of their structural properties have been translated to plain text. We have explored several options to convert structured documents into flat documents, in combination with two probabilistic methods for text categorization. The main conclusion of our experiments is that taking advantage of document structure to improve classification results is a difficult task.