A Hybrid Approach for XML Similarity

Authors:
Joe Tekli;Richard Chbeir;Kokou Yetongnon
Affiliations:
LE2I Laboratory UMR-CNRS, University of Bourgogne, 21078 Dijon Cedex, France;LE2I Laboratory UMR-CNRS, University of Bourgogne, 21078 Dijon Cedex, France;LE2I Laboratory UMR-CNRS, University of Bourgogne, 21078 Dijon Cedex, France
Venue:
SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
Year:
2007

Citing 12
Cited 4

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Change detection in hierarchically structured information

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The String-to-String Correction Problem

Journal of the ACM (JACM)
Bounds on the Complexity of the Longest Common Subsequence Problem

Journal of the ACM (JACM)
Bounds for the String Editing Problem

Journal of the ACM (JACM)
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Comparing Hierarchical Data in External Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Detecting Changes in XML Documents

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications

Information Systems - Special issue on web data integration
Algorithmic detection of semantic similarity

WWW '05 Proceedings of the 14th international conference on World Wide Web
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Approximate subtree identification in heterogeneous XML documents collections

XSym'05 Proceedings of the Third international conference on Database and XML Technologies

Relating RSS News/Items

ICWE '9 Proceedings of the 9th International Conference on Web Engineering
On nonmetric similarity search problems in complex domains

ACM Computing Surveys (CSUR)
Retrieving similar discussion forum threads: a structure based approach

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Semantic and structural similarities between XML Schemas for integration of ubiquitous healthcare data

Personal and Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the past few years, XML has been established as an effective means for information management, and has been widely exploited for complex data representation. Owing to an unparalleled increasing use of the XML standard, developing efficient techniques for comparing XML-based documents becomes essential in information retrieval (IR) research. Various algorithms for comparing hierarchically structured data, e.g. XML documents, have been proposed in the literature. However, to our knowledge, most of them focus exclusively on comparing documents based on structural features, overlooking the semantics involved. In this paper, we integrate IR semantic similarity assessment in an edit distance algorithm, seeking to amend similarity judgments when comparing XML-based documents. Our approach comprises of an original edit distance operation cost model, introducing semantic relatedness of XML element/attribute labels, in traditional edit distance computations. A prototype has been developed to evaluate our model's performance. Experiments yielded notable results.