Clustering DTDs: an interactive two-level approach

Authors:
Aoying Zhou;Weining Qian;Hailei Qian;Long Zhang;Yuqi Liang;Wen Jin
Affiliations:
Department of Computer Science, Laboratory for Intelligent Information Processing Fudan University, Shanghai 200433, P.R. China;Department of Computer Science, Laboratory for Intelligent Information Processing Fudan University, Shanghai 200433, P.R. China;Department of Computer Science, Laboratory for Intelligent Information Processing Fudan University, Shanghai 200433, P.R. China;Department of Computer Science, Laboratory for Intelligent Information Processing Fudan University, Shanghai 200433, P.R. China;Department of Computer Science, Laboratory for Intelligent Information Processing Fudan University, Shanghai 200433, P.R. China;Department of Computer Science, Simon Fraser University, Canada
Venue:
Journal of Computer Science and Technology
Year:
2002

Citing 10
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Self-organization and associative memory: 3rd edition

Self-organization and associative memory: 3rd edition
Using linear algebra for intelligent information retrieval

SIAM Review
A survey of information retrieval and filtering methods

A survey of information retrieval and filtering methods
Lore: a database management system for semistructured data

ACM SIGMOD Record
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Partitioning-based clustering for Web document categorization

Decision Support Systems - Special issue on WITS '97
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

Clustering large scale of XML documents

GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML (eXtensible Markup Language) is a standard which is widely applied in data representation and data exchange. However, as an important concept of XML, DTD (Document Type Definition) is not taken full advantage in current applications. In this paper, a new method for clustering DTDs is presented, and it can be used in XML document clustering. The two-level method clusters the elements in DTDs and clusters DTDs separately. Element clustering forms the first level and provides dement clusters, which are the generalization of relevant elements. DTD clustering utilizes the generalized information and forms the second level in the whole clustering process. The two-level method has the following advantages: 1) It takes into consideration both the content and the structure within DTDs; 2) The generalized information about elements is more useful than the separated words in the vector model; 3) The two-level method facilitates the searching of outliers. The experiments show that this method is able to categorize the relevant DTDs effectively.