Clustering XML documents using structural summaries

Authors:
Theodore Dalamagas;Tao Cheng;Klaas-Jan Winkel;Timos Sellis
Affiliations:
School of Electr and Comp Engineering, National Technical University of Athens, Zographou, Athens, Greece;Department of Computer Science, University of California, Santa Barbara, CA;Faculty of Computer Science, University of Twente, Enschede, The Netherlands;School of Electr and Comp Engineering, National Technical University of Athens, Zographou, Athens, Greece
Venue:
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Year:
2004

Citing 8
Cited 21

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Change detection in hierarchically structured information

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The String-to-String Correction Problem

Journal of the ACM (JACM)
The Tree-to-Tree Correction Problem

Journal of the ACM (JACM)
XTRACT: a system for extracting document type descriptors from XML documents

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Information Retrieval

Information Retrieval
Comparing Hierarchical Data in External Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Detecting Changes in XML Documents

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Learning-based summarisation of XML documents

Information Retrieval
Xproj: a framework for projected structural clustering of xml documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
An Effective Data Processing Method for Fast Clustering

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Efficient SOAP message exchange and evaluation through XML similarity

Proceedings of the 2008 ACM workshop on Secure web services
Discovering unexpected documents in corpora

Knowledge-Based Systems
A Bloom Filter Based Approach for Evaluating Structural Similarity of XML Documents

WISM '09 Proceedings of the International Conference on Web Information Systems and Mining
Return specification inference and result clustering for keyword search on XML

ACM Transactions on Database Systems (TODS)
An effective detection method for clustering similar XML DTDs using tag sequences

ICCSA'07 Proceedings of the 2007 international conference on Computational science and Its applications - Volume Part II
Improving XML search by generating and utilizing informative result snippets

ACM Transactions on Database Systems (TODS)
Structure and content similarity for clustering XML documents

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Clust-XPaths: clustering of XML paths

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
A complete path representation method with a modified inverted index for efficient retrieval of XML documents

WSEAS Transactions on Computers
An approach for clustering semantically heterogeneous XML schemas

OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
A flexible structured-based representation for XML document mining

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Sequential pattern mining for structure-based XML document classification

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Approximate top-k structural similarity search over XML documents

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
XML clustering based on common neighbor

APWeb'06 Proceedings of the 2006 international conference on Advanced Web and Network Technologies, and Applications
Discovering semantic sibling associations from web documents with XTREEM-SP

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Discovering semantic sibling groups from web documents with XTREEM-SG

EKAW'06 Proceedings of the 15th international conference on Managing Knowledge in a World of Networks
Clustering XML documents by structure

ADBIS'09 Proceedings of the 13th East European conference on Advances in Databases and Information Systems
FXProj: a fuzzy XML documents projected clustering based on structure and content

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work presents a methodology for grouping structurally similar XML documents using clustering algorithms Modeling XML documents with tree-like structures, we face the ‘clustering XML documents by structure' problem as a ‘tree clustering' problem, exploiting distances that estimate the similarity between those trees in terms of the hierarchical relationships of their nodes We suggest the usage of tree structural summaries to improve the performance of the distance calculation and at the same time to maintain or even improve its quality Experimental results are provided using a prototype testbed.