An Effective Data Processing Method for Fast Clustering

Authors:
Hyun-Joo Moon;Sangheon Kim;Jongbae Moon;Eun-Ser Lee
Affiliations:
Dept. of Cultural Contents, Hankuk University of Foreign Studies, Seoul, Korea 130-082;Dept. of Cultural Contents, Hankuk University of Foreign Studies, Seoul, Korea 130-082;Korea Institute of Science and Technology Information, Daejeon, Korea 305-806;Dept. of Computer Engineering, Andong National University, Andong-city, Korea 760-749
Venue:
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Year:
2008

Citing 8
Cited 0

Similarity-based queries for time series data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A tree-edit-distance algorithm for comparing simple, closed shapes

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure

IEEE Transactions on Knowledge and Data Engineering
Combining Top-Down and Bottom-Up Segmentation

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 4 - Volume 04
A tree-based approach to clustering XML documents by structure

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
diffX: an algorithm to detect changes in multi-version XML documents

CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
Clustering XML documents using structural summaries

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because of the extensive diffusion of Internet usage, heterogeneous computing platforms, and ubiquitous computing technologies, Web data that are usually written in XML format are explosively increased. With the growth of Web data and the importance of their clustering, we need similarity detection method because it is a fundamental technology for efficient document management. In this paper, we introduce a similarity detection method that can check both semantic similarity and structural similarity between XML DTDs. For semantic checking, we adopt ontology technology, and we apply longest common string and longest nesting common string methods for structural checking. Our similarity detection method uses multi-tag sequences instead of traversing XML schema trees, so that it gets fast and reasonable results.