XEdge: clustering homogeneous and heterogeneous XML documents using edge summaries
Proceedings of the 2008 ACM symposium on Applied computing
An Effective Data Processing Method for Fast Clustering
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Expert Systems with Applications: An International Journal
Discovering unexpected documents in corpora
Knowledge-Based Systems
Data Discovery and Related Factors of Documents on the Web and the Network
ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part I
Semantic clustering of XML documents
ACM Transactions on Information Systems (TOIS)
Semantics-guided clustering of heterogeneous XML schemas
Journal on data semantics IX
An effective detection method for clustering similar XML DTDs using tag sequences
ICCSA'07 Proceedings of the 2007 international conference on Computational science and Its applications - Volume Part II
A weighted common structure based clustering technique for XML documents
Journal of Systems and Software
Highly efficient algorithms for structural clustering of large websites
Proceedings of the 20th international conference on World wide web
Collaborative clustering of XML documents
Journal of Computer and System Sciences
An approach for clustering semantically heterogeneous XML schemas
OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
A flexible structured-based representation for XML document mining
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Clustering large scale of XML documents
GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
XML document clustering by independent component analysis
KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents
XML document clustering using structure-preserving flat representation of XML content and structure
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Exploring dictionary-based semantic relatedness in labeled tree data
Information Sciences: an International Journal
X-Class: Associative Classification of XML Documents by Structure
ACM Transactions on Information Systems (TOIS)
Hierarchical clustering of XML documents focused on structural components
Data & Knowledge Engineering
Hi-index | 0.00 |
We propose a novel methodology for clustering XML documents on the basis of their structural similarities. The idea is to equip each cluster with an XML cluster representative, i.e. an XML document subsuming the most typical structural specifics of a set of XML documents. Clustering is essentially accomplished by comparing cluster representatives, and updating the representatives as soon as new clusters are detected. We present an algorithm for the computation of an XML representative based on suitable techniques for identifying significant node matchings and for reliably merging and pruning XML trees. Experimental evaluation performed on both synthetic and real data shows the effectiveness of our approach.