An Effective Data Processing Method for Fast Clustering

  • Authors:
  • Hyun-Joo Moon;Sangheon Kim;Jongbae Moon;Eun-Ser Lee

  • Affiliations:
  • Dept. of Cultural Contents, Hankuk University of Foreign Studies, Seoul, Korea 130-082;Dept. of Cultural Contents, Hankuk University of Foreign Studies, Seoul, Korea 130-082;Korea Institute of Science and Technology Information, Daejeon, Korea 305-806;Dept. of Computer Engineering, Andong National University, Andong-city, Korea 760-749

  • Venue:
  • ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Because of the extensive diffusion of Internet usage, heterogeneous computing platforms, and ubiquitous computing technologies, Web data that are usually written in XML format are explosively increased. With the growth of Web data and the importance of their clustering, we need similarity detection method because it is a fundamental technology for efficient document management. In this paper, we introduce a similarity detection method that can check both semantic similarity and structural similarity between XML DTDs. For semantic checking, we adopt ontology technology, and we apply longest common string and longest nesting common string methods for structural checking. Our similarity detection method uses multi-tag sequences instead of traversing XML schema trees, so that it gets fast and reasonable results.