XRules: an effective structural classifier for XML data

Authors:
Mohammed J. Zaki;Charu C. Aggarwal
Affiliations:
Rensselaer Polytechnic Institute;IBM T. J. Watson Research Center
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 13
Cited 70

Classification algorithms

Classification algorithms
C4.5: programs for machine learning

C4.5: programs for machine learning
Discovering typical structures of documents: a road map approach

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
BOAT—optimistic decision tree construction

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
LOGML: Log Markup Language for Web Usage Mining

WEBKDD '01 Revised Papers from the Third International Workshop on Mining Web Log Data Across All Customers Touch Points
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On effective classification of strings with wavelets

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining

Managing and analyzing carbohydrate data

ACM SIGMOD Record
Supervised learning for the legacy document conversion

Proceedings of the 2004 ACM symposium on Document engineering
Discovering frequently changing structures from historical structural deltas of unordered XML

Proceedings of the thirteenth ACM international conference on Information and knowledge management
VRules: an effective association-based classifier for videos

Proceedings of the 2nd ACM international workshop on Multimedia databases
Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees

IEEE Transactions on Knowledge and Data Engineering
Bayesian network model for semi-structured document classification

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
A Probabilistic Model for Mining Labeled Ordered Trees: Capturing Patterns in Carbohydrate Sugar Chains

IEEE Transactions on Knowledge and Data Engineering
Peer-to-peer management of XML data: issues and research challenges

ACM SIGMOD Record
Web data extraction based on structural similarity

Knowledge and Information Systems
CTC — Correlating Tree Patterns for Classification

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
XRules: An effective algorithm for structural classification of XML data

Machine Learning
A new efficient probabilistic model for mining labeled ordered trees

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
On Mining Instance-Centric Classification Rules

IEEE Transactions on Knowledge and Data Engineering
Multi-evidence, multi-criteria, lazy associative document classification

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
TRIPS and TIDES: new algorithms for tree mining

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
FRACTURE mining: mining frequently and concurrently mutating structures from historical XML documents

Data & Knowledge Engineering - Special issue: WIDM 2004
Efficiently Mining Frequent Embedded Unordered Trees

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
XML structural delta mining: issues and challenges

Data & Knowledge Engineering - Special issue: ER 2003
A subexponential algorithm for the coloured tree partition problem

Discrete Applied Mathematics
Investigating Semantic Measures in XML Clustering

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Xproj: a framework for projected structural clustering of xml documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A new efficient probabilistic model for mining labeled ordered trees applied to glycobiology

ACM Transactions on Knowledge Discovery from Data (TKDD)
A heuristic algorithm for clustering rooted ordered trees

Intelligent Data Analysis
Effective and efficient itemset pattern summarization: regression-based approaches

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Combining Web Usage Mining and XML Mining in a Real Case Study

From Web to Social Web: Discovering and Deploying User and Content Profiles
A bottom-up approach for XML documents classification

IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Ensemble-Trees: Leveraging Ensemble Power Inside Decision Trees

DS '08 Proceedings of the 11th International Conference on Discovery Science
Feature Matrix Extraction and Classification of XML Pages

Advanced Web and NetworkTechnologies, and Applications
Propositionalisation of Profile Hidden Markov Models for Biological Sequence Analysis

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Information Extraction by XLM

KES '07 Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference
An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
In the Search of NECTARs from Evolutionary Trees

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Process of applying data mining techniques to XML data

Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
The role of roles in classifying annotated biomedical text

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Adaptive XML Tree Classification on Evolving Data Streams

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Mining sequential patterns and tree patterns to detect erroneous sentences

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Towards role-based filtering of disease outbreak reports

Journal of Biomedical Informatics
Mining tree-structured data on multicore systems

Proceedings of the VLDB Endowment
GConnect: a connectivity index for massive disk-resident graphs

Proceedings of the VLDB Endowment
Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams

Proceedings of the 2010 conference on Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams
Time and space efficient discovery of maximal geometric graphs

DS'07 Proceedings of the 10th international conference on Discovery science
Authorship classification: a syntactic tree mining approach

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Exploiting salient patterns for question detection and question retrieval in community-based question answering

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
FVC: a feature-vector-based classification for XML dissemination

DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
A statistical interestingness measures for XML based association rules

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Information extraction using XPath

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
NDPMine: efficiently mining discriminative numerical features for pattern-based classification

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Fast, effective molecular feature mining by local optimization

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Frequent tree pattern mining: A survey

Intelligent Data Analysis
Authorship classification: a discriminative syntactic tree mining approach

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Clust-XPaths: clustering of XML paths

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Automatic extraction rules generation based on XPath pattern learning

WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Alternative Approach to Tree-Structured Web Log Representation and Mining

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Mining frequent trees based on topology projection

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Feature selection, rule extraction, and score model: making ATC competitive with SVM

RSKT'06 Proceedings of the First international conference on Rough Sets and Knowledge Technology
Tree2: decision trees for tree structured data

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
A flexible structured-based representation for XML document mining

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Transforming XML trees for efficient classification and clustering

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
2-PS based associative text classification

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Generic pattern mining via data mining template library

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Encoding XML in vector spaces

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Segmented document classification: problem and solution

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Modified naïve bayes classifier for e-catalog classification

DEECS'06 Proceedings of the Second international conference on Data Engineering Issues in E-Commerce and Services
A structure preserving flat data format representation for tree-structured data

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Classifying Vietnamese disease outbreak reports with important sentences and rich features

Proceedings of the Third Symposium on Information and Communication Technology
An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Efficiently Mining Frequent Embedded Unordered Trees

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods in which each document is treated as a bag of words. Such techniques ignore a significant amount of information hidden inside the documents. In this paper we discuss the problem of rule based classification of XML data by using frequent discriminatory substructures within XML documents. Such a technique is more capable of finding the classification characteristics of documents. In addition, the technique can also be extended to cost sensitive classification. We show the effectiveness of the method with respect to other classifiers. We note that the methodology discussed in this paper is applicable to any kind of semi-structured data.