Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Identifying syntactic differences between two programs
Software—Practice & Experience
Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Alignment of trees: an alternative to tree edit
Theoretical Computer Science
Change detection in hierarchically structured information
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
On supporting containment queries in relational database management systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
New algorithm for ordered tree-to-tree correction problem
Journal of Algorithms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
A comprehensive XQuery to SQL translation using dynamic interval encoding
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Joe Celko's SQL for Smarties: Trees and Hierarchies
Joe Celko's SQL for Smarties: Trees and Hierarchies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
An Efficient Algorithm to Compute Differences between Structured Documents
IEEE Transactions on Knowledge and Data Engineering
XML stream processing using tree-edit distance embeddings
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Holistic twig joins on indexed XML documents
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
An incrementally maintainable index for approximate lookups in hierarchical data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
The power of two min-hashes for similarity search among hierarchical data objects
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient Similarity Search for Tree-Structured Data
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Evaluating Performance and Quality of XML-Based Similarity Joins
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
A Tree Distance Function Based on Multi-sets
New Frontiers in Applied Data Mining
Sibling Distance for Rooted Labeled Trees
New Frontiers in Applied Data Mining
A cluster-based approach to XML similarity joins
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
A system for detecting xml similarity in content and structure using relational database
Proceedings of the 18th ACM conference on Information and knowledge management
The pq-gram distance between ordered labeled trees
ACM Transactions on Database Systems (TODS)
Comparing stars: on approximating graph edit distance
Proceedings of the VLDB Endowment
XML-SIM: Structure and Content Semantic Similarity Detection Using Keys
OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
The paths more taken: matching DOM trees to search logs for accurate webpage clustering
Proceedings of the 19th international conference on World wide web
XML: some papers in a haystack
ACM SIGMOD Record
GRAMS3: an efficient framework for XML structural similarity search
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
XML structural similarity search using mapreduce
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Approximate joins for XML using g-string
XSym'10 Proceedings of the 7th international XML database conference on Database and XML technologies
pq-hash: an efficient method for approximate XML joins
WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Evolutionary taxonomy construction from dynamic tag space
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
No tag, a little nesting, and great XML keyword search
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
KCAM: concentrating on structural similarity for XML fragments
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Similarity join on XML based on k-generation set distance
WAIM'11 Proceedings of the 2011 international conference on Web-Age Information Management
Measuring structural similarity of semistructured data based on information-theoretic approaches
The VLDB Journal — The International Journal on Very Large Data Bases
What is the IQ of your data transformation system?
Proceedings of the 21st ACM international conference on Information and knowledge management
Synthetising changes in XML documents as PULs
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
When integrating data from autonomous sources, exact matches of data items that represent the same real world object often fail due to a lack of common keys. Yet in many cases structural information is available and can be used to match such data. As a running example we use residential address information. Addresses are hierarchical structures and are present in many databases. Often they are the best, if not only, relationship between autonomous data sources. Typically the matching has to be approximate since the representations in the sources differ.We propose pq-grams to approximately match hierarchical information from autonomous sources. We define the pq-gram distance between ordered labeled trees as an effective and efficient approximation of the well-known tree edit distance. We analyze the properties of the pq-gram distance and compare it with the edit distance and alternative approximations. Experiments with synthetic and real world data confirm the analytic results and the scalability of our approach.