A Two-Level Method for Clustering DTDs

  • Authors:
  • Weining Qian;Long Zhang;Yuqi Liang;Hailei Qian;Wen Jin

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • WAIM '00 Proceedings of the First International Conference on Web-Age Information Management
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML is a standard that is widely applied in data representation and data exchange. However, as an important part of XML, DTD is not taken full advantage of in current applications. In this paper, a new method for clustering DTDs is presented, so that it can be used in XML document clustering. The two-level method clusters the elements in DTDs and DTDs separately. Element clustering forms the first level, and provides the element clusters, which is the generalization of relevant elements. DTD clustering utilizes the generalized information and forms the second level in the whole clustering process. The two-level method has the advantages that: 1) it takes into consideration both the content and the structure within the DTDs; 2) the generalized information about elements is more useful than the separated words in the vector model; 3) the two-level method facilitates the searching of outliers. The experiments show that this method is able to categorize the relevant DTDs effectively.