Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Change detection in hierarchically structured information
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Meaningful change detection in structured data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
An introduction to Kolmogorov complexity and its applications (2nd ed.)
An introduction to Kolmogorov complexity and its applications (2nd ed.)
Extracting schema from semistructured data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Modern Information Retrieval
The XXL search engine: ranked retrieval of XML data using indexes and ontologies
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
XClust: clustering XML schemas for effective integration
Proceedings of the eleventh international conference on Information and knowledge management
A System for Approximate Tree Matching
IEEE Transactions on Knowledge and Data Engineering
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Extended application of suffix trees to data compression
DCC '96 Proceedings of the Conference on Data Compression
Capturing both types and constraints in data integration
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
Information Systems - Special issue on web data integration
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Fast Detection of XML Structural Similarity
IEEE Transactions on Knowledge and Data Engineering
Robust Identification of Fuzzy Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A methodology for clustering XML documents by structure
Information Systems
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Propagation-vectors for trees (PVT): concise yet effective summaries for hierarchical data and trees
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
A cluster-based approach to XML similarity joins
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
The pq-gram distance between ordered labeled trees
ACM Transactions on Database Systems (TODS)
Reducing metadata complexity for faster table summarization
Proceedings of the 13th International Conference on Extending Database Technology
A development and application of similarity detection methods for plagiarism of online reports
ITHET'10 Proceedings of the 9th international conference on Information technology based higher education and training
A bounded distance metric for comparing tree structure
Information Systems
Keyword search over relational databases: a metadata approach
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Web Semantics: Science, Services and Agents on the World Wide Web
Survey: An overview on XML similarity: Background, current trends and future directions
Computer Science Review
Measuring structural similarity of semistructured data based on information-theoretic approaches
The VLDB Journal — The International Journal on Very Large Data Bases
X-Class: Associative Classification of XML Documents by Structure
ACM Transactions on Information Systems (TOIS)
Style-based similarity search for office XML documents
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Hierarchical clustering of XML documents focused on structural components
Data & Knowledge Engineering
Information Systems
An Evaluation of Similarity Search Methods Blending Structures and Keywords in XML Documents
Proceedings of International Conference on Information Integration and Web-based Applications & Services
Hi-index | 0.00 |
We propose a technique for measuring the structural similarity of semistructured documents based on entropy. After extracting the structural information from two documents we use either Ziv-Lempel encoding or Ziv-Merhav crossparsing to determine the entropy and consequently the similarity between the documents. To the best of our knowledge, this is the first true linear-time approach for evaluating structural similarity. In an experimental evaluation we demonstrate that the results of our algorithm in terms of clustering quality are on a par with or even better than existing approaches.