Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Change detection in hierarchically structured information
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The String-to-String Correction Problem
Journal of the ACM (JACM)
Bounds on the Complexity of the Longest Common Subsequence Problem
Journal of the ACM (JACM)
Bounds for the String Editing Problem
Journal of the ACM (JACM)
The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
XIRQL: a query language for information retrieval in XML documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Querying and ranking XML documents
Journal of the American Society for Information Science and Technology - XML
Information Retrieval
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
A system for knowledge management in bioinformatics
Proceedings of the eleventh international conference on Information and knowledge management
Tamino - A DBMS designed for XML
Proceedings of the 17th International Conference on Data Engineering
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Comparing Hierarchical Data in External Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
Detecting Changes in XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Clustering Algorithms and Validity Measures
SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure
IEEE Transactions on Knowledge and Data Engineering
Information Systems - Special issue on web data integration
Verbs semantics and lexical selection
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Word-sense disambiguation using statistical models of Roget's categories trained on large corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
FleXPath: flexible structure and full-text querying for XML
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Semantic Similarity Search on Semistructured Data with the XXL Search Engine
Information Retrieval
Algorithmic detection of semantic similarity
WWW '05 Proceedings of the 14th international conference on World Wide Web
Bootstrapping ontology alignment methods with APFEL
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
DogmatiX tracks down duplicates in XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A survey on tree edit distance and related problems
Theoretical Computer Science
Content and Structure Based Approach For XML Similarity
CIT '05 Proceedings of the The Fifth International Conference on Computer and Information Technology
Finding Syntactic Similarities Between XML Documents
DEXA '06 Proceedings of the 17th International Conference on Database and Expert Systems Applications
Matching large schemas: Approaches and evaluation
Information Systems
Structural similarity in geographical queries to improve query answering
Proceedings of the 2007 ACM symposium on Applied computing
COMA: a system for flexible combination of schema matching approaches
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Structure-based inference of xml similarity for fuzzy duplicate detection
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Measuring the structural similarity of semistructured documents using entropy
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
A hybrid similarity matching algorithm for mapping and rading ontologies via a multi-agent system
ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
Semantic web services discovery based on structural ontology matching
International Journal of Web and Grid Services
Improving XML schema matching performance using Prüfer sequences
Data & Knowledge Engineering
Poster Session: An Indexing Structure for Automatic Schema Matching
ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
SenseRelate targetword: a generalized framework for word sense disambiguation
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Graph connectivity measures for unsupervised word sense disambiguation
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Using information content to evaluate semantic similarity in a taxonomy
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
XML Schema Element Similarity Measures: A Schema Matching Context
OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
A methodology for clustering XML documents by structure
Information Systems
A fine-grained XML structural comparison approach
ER'07 Proceedings of the 26th international conference on Conceptual modeling
Transforming XML trees for efficient classification and clustering
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
LAX: an efficient approximate XML join based on clustered leaf nodes for XML data integration
BNCOD'05 Proceedings of the 22nd British National conference on Databases: enterprise, Skills and Innovation
Approximate subtree identification in heterogeneous XML documents collections
XSym'05 Proceedings of the Third international conference on Database and XML Technologies
Survey: An overview on XML similarity: Background, current trends and future directions
Computer Science Review
Exploring dictionary-based semantic relatedness in labeled tree data
Information Sciences: an International Journal
A visual programming language for XML manipulation
Journal of Visual Languages and Computing
Structural and semantic similarity for XML comparison
Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems
Semantic to intelligent web era: building blocks, applications, and current trends
Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems
Hi-index | 0.00 |
XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.