Clustering XML documents by structure based on common neighbor

Authors:
Xizhe Zhang;Tianyang Lv;Zhengxuan Wang;Wanli Zuo
Affiliations:
College of Computer Science and Technology, Jilin University, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China
Venue:
CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Year:
2005

Citing 4
Cited 0

XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
A New Cluster Isolation Criterion Based on Dissimilarity Increments

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is important to perform the clustering task on XML documents. However, it is difficult to select the appropriate parameters’ value for the clustering algorithms. Meanwhile, current clustering algorithms lack the effective mechanism to detect outliers while treating outliers as “noise”. By integrating outlier detection with clustering, the paper takes a new approach for analyzing the XML documents by structure. After stating the concept of common neighbor based outlier, the paper proposes a new clustering algorithm, which stops clustering automatically by utilizing the outlier information and needs only one parameter, whose appropriate value range is decided in the outlier mining process. After discussing some features of the proposed algorithm, the paper adopts the XML dataset with different structure and other real-life datasets to compare it with other clustering algorithms.