Finding maximal similar paths between XML documents using sequential patterns

Authors:
Jung-Won Lee;Seung-Soo Park
Affiliations:
Dept. of Computer Science and Engineering, Ewha Womans University, Seoul, Korea;Dept. of Computer Science and Engineering, Ewha Womans University, Seoul, Korea
Venue:
ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
Year:
2004

Citing 9
Cited 2

The tree inclusion problem

TAPSOFT '91 Proceedings of the international joint conference on theory and practice of software development on Colloquium on trees in algebra and programming (CAAP '91): vol 1
Extracting schema from semistructured data

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Discovering typical structures of documents: a road map approach

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Storing semistructured data with STORED

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Pattern Matching in Trees

Journal of the ACM (JACM)
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Preparations for Semantics-Based XML Mining

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance

XML schema clustering with semantic and hierarchical similarity measures

Knowledge-Based Systems
Process of applying data mining techniques to XML data

Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006

Quantified Score

Hi-index	0.00

Visualization

Abstract

Techniques for storing XML documents, optimizing the query, and indexing for XML have been active subjects of research. Most of these techniques are focused on XML documents shared with the same structure (i.e., the same DTD or XML Schema). However, when XML documents from the Web or EDMS (Electronic Document Management System) are required to be merged or classified, it is very important to find the common structure among multiple documents for the process of handling documents. In this paper, we propose a new methodology for extracting common structures from XML documents and finding maximal similar paths between structures using sequential pattern mining algorithms. Correct determination of common structures between XML documents provides an important basis for a variety of applications of XML document mining and processing. Experiments with XML documents show that our adapted sequential pattern mining algorithms can find common structures and maximal similar paths between them exactly.