Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
A semi-structured document model for text mining
Journal of Computer Science and Technology
A Pseudo-Metric for Weighted Point Sets
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part III
A new polynomial-time algorithm for linear programming
STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Proceedings of the 2007 ACM symposium on Document engineering
Temporal and multi-versioned XML documents: A survey
Information Processing and Management: an International Journal
Hi-index | 0.00 |
This paper proposes a novel approach to measuring XML document similarity by taking into account the semantics between XML elements. The motivation of the proposed approach is to overcome the problems of "under-contributionö and "over-contributionö existing in previous work. The element semantics are learned in an unsupervised way and the Proportional Transportation Similarity is proposed to evaluate XML document similarity by modeling the similarity calculation as a transportation problem. Experiments of clustering are performed on three ACM SIGMOD data sets and results show the favorable performance of the proposed approach.