X-Class: Associative Classification of XML Documents by Structure

Authors:
Gianni Costa;Riccardo Ortale;Ettore Ritacco
Affiliations:
ICAR-CNR;ICAR-CNR;ICAR-CNR
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2013

Citing 54
Cited 1

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Original Contribution: Stacked generalization

Neural Networks
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
A classifier for semi-structured documents

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Support vector machines: hype or hallelujah?

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Machine Learning

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
A semi-structured document model for text mining

Journal of Computer Science and Technology
Support Vector Machines

IEEE Intelligent Systems
FOIL: A Midterm Report

ECML '93 Proceedings of the European Conference on Machine Learning
Kernels for Semi-Structured Data

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Challenges of the Email Domain for Text Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Improving an Association Rule Based Classifier

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Text Document Categorization by Term Association

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure

IEEE Transactions on Knowledge and Data Engineering
FARMER: finding interesting rule groups in microarray datasets

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
An associative classifier based on positive and negative rules

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Fast Detection of XML Structural Similarity

IEEE Transactions on Knowledge and Data Engineering
A tree-based approach to clustering XML documents by structure

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
XRules: An effective algorithm for structural classification of XML data

Machine Learning
CCCS: a top-down associative classifier for imbalanced class distribution

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to the special issue on XML retrieval

ACM Transactions on Information Systems (TOIS)
XML search: languages, INEX and scoring

ACM SIGMOD Record
Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents

ACM SIGIR Forum
Xproj: a framework for projected structural clustering of xml documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A review of associative classification mining

The Knowledge Engineering Review
Measuring the structural similarity of semistructured documents using entropy

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Report on the XML mining track at INEX 2007 categorization and clustering of XML documents

ACM SIGIR Forum
Introduction to Information Retrieval

Introduction to Information Retrieval
Probabilistic Methods for Structured Document Classification at INEX'07

Focused Access to XML Documents
XML Document Classification Using Extended VSM

Focused Access to XML Documents
A bottom-up approach for XML documents classification

IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Support Vector Machines

Support Vector Machines
Route kernels for trees

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Exploiting structural information for semi-structured document categorization

Information Processing and Management: an International Journal
A methodology for clustering XML documents by structure

Information Systems
Extended VSM for XML document classification using frequent subtrees

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Effective XML Classification Using Content and Structural Information via Rule Learning

ICTAI '11 Proceedings of the 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence
Sequential pattern mining for structure-based XML document classification

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Transforming XML trees for efficient classification and clustering

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Classification of XSLT-Generated web documents with support vector machines

KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents
Effects of kernel function on Nu support vector machines in extreme cases

IEEE Transactions on Neural Networks
On Effective XML Clustering by Path Commonality: An Efficient and Scalable Algorithm

ICTAI '12 Proceedings of the 2012 IEEE 24th International Conference on Tools with Artificial Intelligence - Volume 01

Hierarchical clustering of XML documents focused on structural components

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The supervised classification of XML documents by structure involves learning predictive models in which certain structural regularities discriminate the individual document classes. Hitherto, research has focused on the adoption of prespecified substructures. This is detrimental for classification effectiveness, since the a priori chosen substructures may not accord with the structural properties of the XML documents. Therein, an unexplored question is how to choose the type of structural regularity that best adapts to the structures of the available XML documents. We tackle this problem through X-Class, an approach that handles all types of tree-like substructures and allows for choosing the most discriminatory one. Algorithms are designed to learn compact rule-based classifiers in which the chosen substructures discriminate the classes of XML documents. X-Class is studied across various domains and types of substructures. Its classification performance is compared against several rule-based and SVM-based competitors. Empirical evidence reveals that the classifiers induced by X-Class are compact, scalable, and at least as effective as the established competitors. In particular, certain substructures allow the induction of very compact classifiers that generally outperform the rule-based competitors in terms of effectiveness over all chosen corpora of XML data. Furthermore, such classifiers are substantially as effective as the SVM-based competitor, with the additional advantage of a high-degree of interpretability.