Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Change detection in hierarchically structured information
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Meaningful change detection in structured data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Lore: a database management system for semistructured data
ACM SIGMOD Record
Extracting schema from semistructured data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Discovering typical structures of documents: a road map approach
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Storing semistructured data with STORED
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Semantic integration of semistructured and structured data sources
ACM SIGMOD Record
The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
Matching Hierarchical Structures Using Association Graphs
IEEE Transactions on Pattern Analysis and Machine Intelligence
Data on the Web: from relations to semistructured data and XML
Data on the Web: from relations to semistructured data and XML
Turbo-charging vertical mining of large databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Modern Information Retrieval
Cluster validity methods: part I
ACM SIGMOD Record
XClust: clustering XML schemas for effective integration
Proceedings of the eleventh international conference on Information and knowledge management
A System for Approximate Tree Matching
IEEE Transactions on Knowledge and Data Engineering
Scalable Algorithms for Association Mining
IEEE Transactions on Knowledge and Data Engineering
Tamino - A DBMS designed for XML
Proceedings of the 17th International Conference on Data Engineering
Relational Databases for Querying XML Documents: Limitations and Opportunities
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Comparing Hierarchical Data in External Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Storage and Retrieval of XML Documents Using Object-Relational Databases
DEXA '99 Proceedings of the 10th International Conference on Database and Expert Systems Applications
The VLDB Journal — The International Journal on Very Large Data Bases
Anatomy of a native XML base management system
The VLDB Journal — The International Journal on Very Large Data Bases
Detecting Changes in XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure
IEEE Transactions on Knowledge and Data Engineering
Fast vertical mining using diffsets
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Information Systems - Special issue on web data integration
Element matching across data-oriented XML sources using a multi-strategy clustering model
Data & Knowledge Engineering
Fast Detection of XML Structural Similarity
IEEE Transactions on Knowledge and Data Engineering
A partition index for XML and semi-structured data
Data & Knowledge Engineering
A tree-based approach to clustering XML documents by structure
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications
IEEE Transactions on Knowledge and Data Engineering
ACM SIGIR Forum
Indexing graph-structured XML data for efficient structural join operation
Data & Knowledge Engineering
Introduction to the special issue on XML retrieval
ACM Transactions on Information Systems (TOIS)
A clustering method based on path similarities of XML data
Data & Knowledge Engineering
Xproj: a framework for projected structural clustering of xml documents
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Measuring the structural similarity of semistructured documents using entropy
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
XEdge: clustering homogeneous and heterogeneous XML documents using edge summaries
Proceedings of the 2008 ACM symposium on Applied computing
A methodology for clustering XML documents by structure
Information Systems
Overview of the INEX 2009 XML mining track: clustering and classification of XML documents
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
XML data clustering: An overview
ACM Computing Surveys (CSUR)
Effective XML Classification Using Content and Structural Information via Rule Learning
ICTAI '11 Proceedings of the 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence
XCLS: a fast and effective clustering algorithm for heterogenous XML documents
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
X-Class: Associative Classification of XML Documents by Structure
ACM Transactions on Information Systems (TOIS)
On Effective XML Clustering by Path Commonality: An Efficient and Scalable Algorithm
ICTAI '12 Proceedings of the 2012 IEEE 24th International Conference on Tools with Artificial Intelligence - Volume 01
Editorial: COMPENDIUM: A text summarization system for generating abstracts of research papers
Data & Knowledge Engineering
Hi-index | 0.00 |
Clustering XML documents by structure is the task of grouping them by common structural components. Hitherto, this has been accomplished by looking at the occurrence of one preestablished type of structural components in the structures of the XML documents. However, the a-priori chosen structural components may not be the most appropriate for effective clustering. Moreover, it is likely that the resulting clusters exhibit a certain extent of inner structural inhomogeneity, because of uncaught differences in the structures of the XML documents, due to further neglected forms of structural components. To overcome these limitations, a new hierarchical approach is proposed, that allows to consider (if necessary) multiple forms of structural components to isolate structurally-homogeneous clusters of XML documents. At each level of the resulting hierarchy, clusters are divided by considering some type of structural components (unaddressed at the preceding levels), that still differentiate the structures of the XML documents. Each cluster in the hierarchy is summarized through a novel technique, that provides a clear and differentiated understanding of its structural properties. A comparative evaluation over both real and synthetic XML data proves that the devised approach outperforms established competitors in effectiveness and scalability. Cluster summarization is also shown to be very representative.