Similarity evaluation on tree-structured data

Authors:
Rui Yang;Panos Kalnis;Anthony K. H. Tung
Affiliations:
National University of Singapore;National University of Singapore;National University of Singapore
Venue:
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Year:
2005

Citing 17
Cited 40

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Approximate string-matching with q-grams and maximal matches

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A new method for similarity indexing of market basket data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Searching in metric spaces

ACM Computing Surveys (CSUR)
Approximate XML joins

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free

Proceedings of the 27th International Conference on Very Large Data Bases
Computing the Threshold for q-Gram Filters

SWAT '02 Proceedings of the 8th Scandinavian Workshop on Algorithm Theory
Alignment of Trees - An Alternative to Tree Edit

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
On Using q-Gram Locations in Approximate String Matching

ESA '95 Proceedings of the Third Annual European Symposium on Algorithms
Efficient similarity search for market basket data

The VLDB Journal — The International Journal on Very Large Data Bases
Correlating XML data streams using tree-edit distance embeddings

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

An incrementally maintainable index for approximate lookups in hierarchical data

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
On approximate matching of programs for protecting libre software

CASCON '06 Proceedings of the 2006 conference of the Center for Advanced Studies on Collaborative research
A relation between edit distance for ordered trees and edit distance for Euler strings

Information Processing Letters
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Pattern-based behavior synthesis for FPGA resource reduction

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Substructure similarity measurement in chinese recipes

Proceedings of the 17th international conference on World Wide Web
Scalable detection of semantic clones

Proceedings of the 30th international conference on Software engineering
Automated xacml policy reconfiguration for evaluation optimisation

Proceedings of the fourth international workshop on Software engineering for secure systems
Computing structural similarity of source XML schemas against domain XML schema

ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
Efficient Similarity Search for Tree-Structured Data

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Support for seamless data exchanges between web services through information mapping analysis using kernel methods

Expert Systems with Applications: An International Journal
Approximating Tree Edit Distance through String Edit Distance for Binary Tree Codes

SOFSEM '09 Proceedings of the 35th Conference on Current Trends in Theory and Practice of Computer Science
A Tree Distance Function Based on Multi-sets

New Frontiers in Applied Data Mining
Sibling Distance for Rooted Labeled Trees

New Frontiers in Applied Data Mining
Accurate and Efficient Structural Characteristic Feature Extraction for Clone Detection

FASE '09 Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Data Discovery and Related Factors of Documents on the Web and the Network

ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part I
Constant Factor Approximation of Edit Distance of Bounded Height Unordered Trees

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
The pq-gram distance between ordered labeled trees

ACM Transactions on Database Systems (TODS)
Comparing stars: on approximating graph edit distance

Proceedings of the VLDB Endowment
An efficient unordered tree kernel and its application to glycan classification

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
XML: some papers in a haystack

ACM SIGMOD Record
Scalable and systematic detection of buggy inconsistencies in source code

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A generalized control-flow-aware pattern recognition algorithm for behavioral synthesis

Proceedings of the Conference on Design, Automation and Test in Europe
GRAMS3: an efficient framework for XML structural similarity search

DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Approximating Tree Edit Distance through String Edit Distance for Binary Tree Codes

Fundamenta Informaticae
XML structural similarity search using mapreduce

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Approximate joins for XML using g-string

XSym'10 Proceedings of the 7th international XML database conference on Database and XML technologies
XML data clustering: An overview

ACM Computing Surveys (CSUR)
RTED: a robust algorithm for the tree edit distance

Proceedings of the VLDB Endowment
pest: Fast approximate keyword search in semantic data using eigenvector-based term propagation

Information Systems
No tag, a little nesting, and great XML keyword search

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Approximate top-k structural similarity search over XML documents

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
KCAM: concentrating on structural similarity for XML fragments

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Process mining by measuring process block similarity

BPM'06 Proceedings of the 2006 international conference on Business Process Management Workshops
Approximating tree edit distance through string edit distance

ISAAC'06 Proceedings of the 17th international conference on Algorithms and Computation
Similarity measure models and algorithms for hierarchical cases

Expert Systems with Applications: An International Journal
Comparing top-k XML lists

Information Systems
RWS-Diff: flexible and efficient change detection in hierarchical data

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Efficient processing of graph similarity queries with edit distance constraints

The VLDB Journal — The International Journal on Very Large Data Bases
A survey on tree edit distance lower bound estimation techniques for similarity join on XML data

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tree-structured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. The generally accepted similarity measure for trees is the edit distance. Although similarity search has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the tree edit distance. In this paper, we propose to transform tree-structured data into an approximate numerical multidimensional vector which encodes the original structure information. We prove that the L1 distance of the corresponding vectors, whose computational complexity is O(|T1| + |T2|), forms a lower bound for the edit distance between trees. Based on the theoretical analysis, we describe a novel algorithm which embeds the proposed distance into a filter-and-refine framework to process similarity search on tree-structured data. The experimental results show that our algorithm reduces dramatically the distance computation cost. Our method is especially suitable for accelerating similarity query processing on large trees in massive datasets.