Propagation-vectors for trees (PVT): concise yet effective summaries for hierarchical data and trees
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
GAIML: A new language for verbal and graphical interaction in chatbots
Mobile Information Systems - Information Assurance and Advanced Human-Computer Interfaces
Towards language-independent web genre detection
Proceedings of the 18th international conference on World wide web
Reducing metadata complexity for faster table summarization
Proceedings of the 13th International Conference on Extending Database Technology
Structural and semantic aspects of similarity of Document Type Definitions and XML schemas
Information Sciences: an International Journal
GRAMS3: an efficient framework for XML structural similarity search
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
XML data clustering: An overview
ACM Computing Surveys (CSUR)
Web Semantics: Science, Services and Agents on the World Wide Web
Clustering XML documents by structure
ADBIS'09 Proceedings of the 13th East European conference on Advances in Databases and Information Systems
Mining frequent association tag sequences for clustering XML documents
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Survey: An overview on XML similarity: Background, current trends and future directions
Computer Science Review
Hi-index | 0.00 |
Detecting structural similarities between XML documents has been the subject of several recent work, and the proposed algorithms mostly use tree edit distance between the corresponding trees of XML documents. However, evaluating a tree edit distance is computationally expensive and does not easily scale up to large collections. We show in this paper that a tree edit distance computation often is not necessary and can be avoided. In particular, we propose a concise structural summary of XML documents and show that a comparison based on this summary is both fast and effective. Our experimental evaluation shows that this method does an excellent job of grouping documents generated by the same DTD, outperforming some of the previously proposed solutions based on a tree comparison. Furthermore, the time complexity of the algorithm is linear on the size of the structural description.