A weighted common structure based clustering technique for XML documents

Authors:
Jeong Hee Hwang;Keun Ho Ryu
Affiliations:
Department of Computer Science, Namseoul University, 21 Maeju-Ri, Seonghwan-Eup, Cheonan, Chungnam 331-707, Republic of Korea;Database/Bioinformatics Laboratory, School of Electrical and Computer Engineering, Chungbuk National University, 12 Gaeshin-Dong, Heungduk-Gu, Cheongju, Chungbuk 361-763, Republic of Korea
Venue:
Journal of Systems and Software
Year:
2010

Citing 21
Cited 3

Knowledge discovery from structural data

Journal of Intelligent Information Systems
Clustering transactions using large items

Proceedings of the eighth international conference on Information and knowledge management
Data clustering: a review

ACM Computing Surveys (CSUR)
Data mining: concepts and techniques

Data mining: concepts and techniques
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
CLOPE: a fast and effective clustering algorithm for transactional data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The XML web: a first study

WWW '03 Proceedings of the 12th international conference on World Wide Web
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
BitCube: A Three-Dimensional Bitmap Indexing for XML Documents

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
XML Clustering by Principal Component Analysis

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
A tree-based approach to clustering XML documents by structure

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Xproj: a framework for projected structural clustering of xml documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
XEdge: clustering homogeneous and heterogeneous XML documents using edge summaries

Proceedings of the 2008 ACM symposium on Applied computing
Clustering XML documents based on structural similarity

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
A new sequential mining approach to XML document clustering*

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
XCLS: a fast and effective clustering algorithm for heterogenous XML documents

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Sequential pattern mining for structure-based XML document classification

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Clustering XML documents using self-organizing maps for structures

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
XML clustering based on common neighbor

APWeb'06 Proceedings of the 2006 international conference on Advanced Web and Network Technologies, and Applications

Weigted-FP-tree based XML query pattern mining

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
FXProj: a fuzzy XML documents projected clustering based on structure and content

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML has recently become very popular as a means of representing semistructured data and as a standard for data exchange over the Web, because of its varied applicability in numerous applications. Therefore, XML documents constitute an important data mining domain. In this paper, we propose a new method of XML document clustering by a global criterion function, considering the weight of common structures. Our approach initially extracts representative structures of frequent patterns from schemaless XML documents using a sequential pattern mining algorithm. Then, we perform clustering of an XML document by the weight of common structures, without a measure of pairwise similarity, assuming that an XML document is a transaction and frequent structures extracted from documents are items of the transaction. We conducted experiments to compare our method with previous methods. The experimental results show the effectiveness of our approach.