Classification algorithms
C4.5: programs for machine learning
C4.5: programs for machine learning
Discovering typical structures of documents: a road map approach
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
BOAT—optimistic decision tree construction
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
LOGML: Log Markup Language for Web Usage Mining
WEBKDD '01 Revised Papers from the Third International Workshop on Mining Web Log Data Across All Customers Touch Points
Efficiently mining frequent trees in a forest
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On effective classification of strings with wavelets
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
TreeFinder: a First Step towards XML Data Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Managing and analyzing carbohydrate data
ACM SIGMOD Record
Supervised learning for the legacy document conversion
Proceedings of the 2004 ACM symposium on Document engineering
Discovering frequently changing structures from historical structural deltas of unordered XML
Proceedings of the thirteenth ACM international conference on Information and knowledge management
VRules: an effective association-based classifier for videos
Proceedings of the 2nd ACM international workshop on Multimedia databases
Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees
IEEE Transactions on Knowledge and Data Engineering
Bayesian network model for semi-structured document classification
Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering
Peer-to-peer management of XML data: issues and research challenges
ACM SIGMOD Record
Web data extraction based on structural similarity
Knowledge and Information Systems
CTC — Correlating Tree Patterns for Classification
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A new efficient probabilistic model for mining labeled ordered trees
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
On Mining Instance-Centric Classification Rules
IEEE Transactions on Knowledge and Data Engineering
Multi-evidence, multi-criteria, lazy associative document classification
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
TRIPS and TIDES: new algorithms for tree mining
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Data & Knowledge Engineering - Special issue: WIDM 2004
Efficiently Mining Frequent Embedded Unordered Trees
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Frequent Subtree Mining - An Overview
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
XML structural delta mining: issues and challenges
Data & Knowledge Engineering - Special issue: ER 2003
A subexponential algorithm for the coloured tree partition problem
Discrete Applied Mathematics
Investigating Semantic Measures in XML Clustering
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Xproj: a framework for projected structural clustering of xml documents
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A new efficient probabilistic model for mining labeled ordered trees applied to glycobiology
ACM Transactions on Knowledge Discovery from Data (TKDD)
A heuristic algorithm for clustering rooted ordered trees
Intelligent Data Analysis
Effective and efficient itemset pattern summarization: regression-based approaches
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Combining Web Usage Mining and XML Mining in a Real Case Study
From Web to Social Web: Discovering and Deploying User and Content Profiles
A bottom-up approach for XML documents classification
IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Ensemble-Trees: Leveraging Ensemble Power Inside Decision Trees
DS '08 Proceedings of the 11th International Conference on Discovery Science
Feature Matrix Extraction and Classification of XML Pages
Advanced Web and NetworkTechnologies, and Applications
Propositionalisation of Profile Hidden Markov Models for Biological Sequence Analysis
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
KES '07 Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference
An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining
Fundamenta Informaticae - Progress on Multi-Relational Data Mining
In the Search of NECTARs from Evolutionary Trees
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Process of applying data mining techniques to XML data
Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
The role of roles in classifying annotated biomedical text
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Adaptive XML Tree Classification on Evolving Data Streams
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Mining sequential patterns and tree patterns to detect erroneous sentences
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Towards role-based filtering of disease outbreak reports
Journal of Biomedical Informatics
Mining tree-structured data on multicore systems
Proceedings of the VLDB Endowment
GConnect: a connectivity index for massive disk-resident graphs
Proceedings of the VLDB Endowment
Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams
Proceedings of the 2010 conference on Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams
Time and space efficient discovery of maximal geometric graphs
DS'07 Proceedings of the 10th international conference on Discovery science
Authorship classification: a syntactic tree mining approach
Proceedings of the ACM SIGKDD Workshop on Useful Patterns
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
FVC: a feature-vector-based classification for XML dissemination
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
A statistical interestingness measures for XML based association rules
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Information extraction using XPath
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
NDPMine: efficiently mining discriminative numerical features for pattern-based classification
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Fast, effective molecular feature mining by local optimization
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Frequent tree pattern mining: A survey
Intelligent Data Analysis
Authorship classification: a discriminative syntactic tree mining approach
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Clust-XPaths: clustering of XML paths
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Automatic extraction rules generation based on XPath pattern learning
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Alternative Approach to Tree-Structured Web Log Representation and Mining
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Mining frequent trees based on topology projection
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Feature selection, rule extraction, and score model: making ATC competitive with SVM
RSKT'06 Proceedings of the First international conference on Rough Sets and Knowledge Technology
Tree2: decision trees for tree structured data
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
A flexible structured-based representation for XML document mining
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Transforming XML trees for efficient classification and clustering
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
2-PS based associative text classification
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Generic pattern mining via data mining template library
Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Segmented document classification: problem and solution
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Modified naïve bayes classifier for e-catalog classification
DEECS'06 Proceedings of the Second international conference on Data Engineering Issues in E-Commerce and Services
A structure preserving flat data format representation for tree-structured data
PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Classifying Vietnamese disease outbreak reports with important sentences and rich features
Proceedings of the Third Symposium on Information and Communication Technology
An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining
Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Efficiently Mining Frequent Embedded Unordered Trees
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Frequent Subtree Mining - An Overview
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Hi-index | 0.00 |
XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods in which each document is treated as a bag of words. Such techniques ignore a significant amount of information hidden inside the documents. In this paper we discuss the problem of rule based classification of XML data by using frequent discriminatory substructures within XML documents. Such a technique is more capable of finding the classification characteristics of documents. In addition, the technique can also be extended to cost sensitive classification. We show the effectiveness of the method with respect to other classifiers. We note that the methodology discussed in this paper is applicable to any kind of semi-structured data.