Transforming XML trees for efficient classification and clustering

Authors:
Laurent Candillier;Isabelle Tellier;Fabien Torre
Affiliations:
GRAppA – Charles de Gaulle University – Lille 3;GRAppA – Charles de Gaulle University – Lille 3;GRAppA – Charles de Gaulle University – Lille 3
Venue:
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Year:
2005

Citing 4
Cited 13

TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure

IEEE Transactions on Knowledge and Data Engineering
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
SSC: statistical subspace clustering

MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition

Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents

ACM SIGIR Forum
Efficient rule based structural algorithms for classification of tree structured data

Intelligent Data Analysis
Word Sense Disambiguation for XML Structure Feature Generation

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Semantic clustering of XML documents

ACM Transactions on Information Systems (TOIS)
XML data clustering: An overview

ACM Computing Surveys (CSUR)
XStreamCluster: an efficient algorithm for streaming XML data clustering

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Collaborative clustering of XML documents

Journal of Computer and System Sciences
Clust-XPaths: clustering of XML paths

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
A flexible structured-based representation for XML document mining

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

Web Semantics: Science, Services and Agents on the World Wide Web
Clustering XML documents by structure

ADBIS'09 Proceedings of the 13th East European conference on Advances in Databases and Information Systems
Exploring dictionary-based semantic relatedness in labeled tree data

Information Sciences: an International Journal
X-Class: Associative Classification of XML Documents by Structure

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of the existing methods we know to tackle datasets of XML documents directly work on the trees representing these XML documents. We investigate in this paper the use of a different kind of representation for the manipulation of XML documents. Our idea is to transform the trees into sets of attribute-values, so as to be able to apply various existing methods of classification and clustering on such data, and benefit from their strengths. We apply this strategy both for the classification task and for the clustering task using the structural description of XML documents alone. For instance, we show that the use of boosted C5 leads to very good results in the classification task of XML documents transformed in this way. The use of SSC in the clustering task benefits from its ability to provide as output an interpretable representation of the clusters found. Finally, we also propose an adaptation of SSC for the classification of XML documents, so that the produced classifier is understandable.