A new sequential mining approach to XML document clustering*

Authors:
Jeong Hee Hwang;Keun Ho Ryu
Affiliations:
Database Laboratory, Chungbuk National University, Korea;Database Laboratory, Chungbuk National University, Korea
Venue:
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Year:
2005

Citing 9
Cited 1

Discovering typical structures of documents: a road map approach

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Clustering transactions using large items

Proceedings of the eighth international conference on Information and knowledge management
Data clustering: a review

ACM Computing Surveys (CSUR)
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Preparations for Semantics-Based XML Mining

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Optimized Substructure Discovery for Semi-structured Data

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
CLOPE: a fast and effective clustering algorithm for transactional data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining

A weighted common structure based clustering technique for XML documents

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML has recently become very popular for representing semi-structured data and a standard for data exchange over the web because of its varied applicability in a number of applications. Therefore, XML documents form an important data mining domain. In this paper, we propose a new XML document clustering technique using sequential pattern mining algorithm. Our approach first extracts the representative structures of frequent patterns from schemaless XML documents by using a sequential pattern mining algorithm. And then, unlike most previous document clustering methods, we apply clustering algorithm for transactional data without a measure of pairwise similarity, considering that an XML document as a transaction and the extracted frequent structures of documents as the items of the transaction. We have experimented our clustering algorithm by comparing it with the previous methods. The experimental results show the effectiveness of the proposed method in performance and in producing clusters with higher cluster cohesion.