Similarity-based queries for time series data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A tree-edit-distance algorithm for comparing simple, closed shapes
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
XClust: clustering XML schemas for effective integration
Proceedings of the eleventh international conference on Information and knowledge management
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure
IEEE Transactions on Knowledge and Data Engineering
Combining Top-Down and Bottom-Up Segmentation
CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 4 - Volume 04
A tree-based approach to clustering XML documents by structure
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
diffX: an algorithm to detect changes in multi-version XML documents
CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
Clustering XML documents using structural summaries
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Hi-index | 0.00 |
Because of the extensive diffusion of Internet usage, heterogeneous computing platforms, and ubiquitous computing technologies, Web data that are usually written in XML format are explosively increased. With the growth of Web data and the importance of their clustering, we need similarity detection method because it is a fundamental technology for efficient document management. In this paper, we introduce a similarity detection method that can check both semantic similarity and structural similarity between XML DTDs. For semantic checking, we adopt ontology technology, and we apply longest common string and longest nesting common string methods for structural checking. Our similarity detection method uses multi-tag sequences instead of traversing XML schema trees, so that it gets fast and reasonable results.