Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Identifying syntactic differences between two programs
Software—Practice & Experience
Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Alignment of trees: an alternative to tree edit
Theoretical Computer Science
Change detection in hierarchically structured information
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
On supporting containment queries in relational database management systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
New algorithm for ordered tree-to-tree correction problem
Journal of Algorithms
Information Retrieval
Accelerating XPath location steps
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Storing and querying ordered XML using a relational database system
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
A comprehensive XQuery to SQL translation using dynamic interval encoding
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Detecting Changes in XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Joe Celko's SQL for Smarties: Trees and Hierarchies
Joe Celko's SQL for Smarties: Trees and Hierarchies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
ORDPATHs: insert-friendly XML node labels
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
An Efficient Algorithm to Compute Differences between Structured Documents
IEEE Transactions on Knowledge and Data Engineering
Fast Detection of XML Structural Similarity
IEEE Transactions on Knowledge and Data Engineering
XML stream processing using tree-edit distance embeddings
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
DogmatiX tracks down duplicates in XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Similarity evaluation on tree-structured data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Approximate matching of hierarchical data using pq-grams
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Holistic twig joins on indexed XML documents
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Fragment-based approximate retrieval in highly heterogeneous XML collections
Data & Knowledge Engineering
Measuring the structural similarity of semistructured documents using entropy
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Approximate Joins for Data-Centric XML
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A methodology for clustering XML documents by structure
Information Systems
XML duplicate detection using sorted neighborhoods
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
The q-gram distance for ordered unlabeled trees
DS'05 Proceedings of the 8th international conference on Discovery Science
An optimal decomposition algorithm for tree edit distance
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
GRAMS3: an efficient framework for XML structural similarity search
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Approximate joins for XML using g-string
XSym'10 Proceedings of the 7th international XML database conference on Database and XML technologies
PG-join: proximity graph based string similarity joins
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Ingredients for accurate, fast, and robust XML similarity joins
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
RTED: a robust algorithm for the tree edit distance
Proceedings of the VLDB Endowment
Similarity join on XML based on k-generation set distance
WAIM'11 Proceedings of the 2011 international conference on Web-Age Information Management
Plag-Inn: intrinsic plagiarism detection using grammar trees
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Measuring structural similarity of semistructured data based on information-theoretic approaches
The VLDB Journal — The International Journal on Very Large Data Bases
Leveraging the storage layer to support XML similarity joins in XDBMSs
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Indexing for subtree similarity-search using edit distance
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
RWS-Diff: flexible and efficient change detection in hierarchical data
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Synthetising changes in XML documents as PULs
Proceedings of the VLDB Endowment
Efficient processing of graph similarity queries with edit distance constraints
The VLDB Journal — The International Journal on Very Large Data Bases
Clustering with Proximity Graphs: Exact and Efficient Algorithms
International Journal of Knowledge-Based Organizations
Hi-index | 0.00 |
When integrating data from autonomous sources, exact matches of data items that represent the same real-world object often fail due to a lack of common keys. Yet in many cases structural information is available and can be used to match such data. Typically the matching must be approximate since the representations in the sources differ. We propose pq-grams to approximately match hierarchical data from autonomous sources and define the pq-gram distance between ordered labeled trees as an effective and efficient approximation of the fanout weighted tree edit distance. We prove that the pq-gram distance is a lower bound of the fanout weighted tree edit distance and give a normalization of the pq-gram distance for which the triangle inequality holds. Experiments on synthetic and real-world data (residential addresses and XML) confirm the scalability of our approach and show the effectiveness of pq-grams.