The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
Measuring the structural similarity of semistructured documents using entropy
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An optimal decomposition algorithm for tree edit distance
ACM Transactions on Algorithms (TALG)
LAX: an efficient approximate XML join based on clustered leaf nodes for XML data integration
BNCOD'05 Proceedings of the 22nd British National conference on Databases: enterprise, Skills and Innovation
Survey: An overview on XML similarity: Background, current trends and future directions
Computer Science Review
Hi-index | 0.00 |
For the past few years, hundreds of document-formats based on XML have appeared. Office documents are typical examples of XML documents. Besides, demands for searching documents become increasing and complicated since we need not only keyword search but also similarity search. In our previous work, we proposed LAX+, an algorithm for measuring a similarity value between XML trees. However, there is a problem that LAX+ performs a rigid matching at leaf-nodes of XML trees. In this paper, we propose two methods: KLAX and LAX&KEY. To measure a precise similarity value between leaf-nodes, KLAX improves LAX+ by-checking the number of common keywords in the leaf-nodes. LAX&KEY separately measures a similarity value between XML trees by LAX+ and a similarity value of common keywords in XML trees, and then combines them. In our experiments with docx, xlsx, and pptx files, the proposed methods yield better results in precision and recall.