A new sequential mining approach to XML document similarity computation

Authors:
Ho-Pong Leung;Fu-Lai Chung;Stephen Chi-Fai Chan
Affiliations:
Department of Computing, Hong Kong Polytechnic University, Hunghom, Kowloon, Hong Kong;Department of Computing, Hong Kong Polytechnic University, Hunghom, Kowloon, Hong Kong;Department of Computing, Hong Kong Polytechnic University, Hunghom, Kowloon, Hong Kong
Venue:
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2003

Citing 6
Cited 3

An introduction to the analysis of algorithms

An introduction to the analysis of algorithms
XMill: an efficient compressor for XML data

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Preparations for Semantics-Based XML Mining

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Declarative Data Cleaning: Language, Model, and Algorithms

Proceedings of the 27th International Conference on Very Large Data Bases
WebFilter: A High-throughput XML-based Publish and Subscribe System

Proceedings of the 27th International Conference on Very Large Data Bases

FRACTURE mining: mining frequently and concurrently mutating structures from historical XML documents

Data & Knowledge Engineering - Special issue: WIDM 2004
XML structural delta mining: issues and challenges

Data & Knowledge Engineering - Special issue: ER 2003
Evaluate structure similarity in XML documents with merge-edit-distance

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

There exist several methods to measuring the structural similarity among XML documents. The data mining approach seems to be a novel, interesting and promising one. In view of the deficiencies encountered by ignoring the hierarchical information in encoding the paths for mining, we propose a new sequential pattern mining scheme for XML document similarity computation. It makes use of the hierarchical information to computing the document structural similarity. In addition, it includes a post-processing step to reuse the mined patterns to estimate the similarity of unmatched elements so that another metric to qualify the similarity between XML documents can be introduced. Encouraging experimental results were obtained and reported.