Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Optimal multi-step k-nearest neighbor search
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A new method for similarity indexing of market basket data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Computing the Threshold for q-Gram Filters
SWAT '02 Proceedings of the 8th Scandinavian Workshop on Algorithm Theory
Alignment of Trees - An Alternative to Tree Edit
CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
On Using q-Gram Locations in Approximate String Matching
ESA '95 Proceedings of the Third Annual European Symposium on Algorithms
Efficient similarity search for market basket data
The VLDB Journal — The International Journal on Very Large Data Bases
Correlating XML data streams using tree-edit distance embeddings
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficiently mining frequent trees in a forest
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An incrementally maintainable index for approximate lookups in hierarchical data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
On approximate matching of programs for protecting libre software
CASCON '06 Proceedings of the 2006 conference of the Center for Advanced Studies on Collaborative research
A relation between edit distance for ordered trees and edit distance for Euler strings
Information Processing Letters
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones
ICSE '07 Proceedings of the 29th international conference on Software Engineering
Pattern-based behavior synthesis for FPGA resource reduction
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Substructure similarity measurement in chinese recipes
Proceedings of the 17th international conference on World Wide Web
Scalable detection of semantic clones
Proceedings of the 30th international conference on Software engineering
Automated xacml policy reconfiguration for evaluation optimisation
Proceedings of the fourth international workshop on Software engineering for secure systems
Computing structural similarity of source XML schemas against domain XML schema
ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
Efficient Similarity Search for Tree-Structured Data
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Expert Systems with Applications: An International Journal
Approximating Tree Edit Distance through String Edit Distance for Binary Tree Codes
SOFSEM '09 Proceedings of the 35th Conference on Current Trends in Theory and Practice of Computer Science
A Tree Distance Function Based on Multi-sets
New Frontiers in Applied Data Mining
Sibling Distance for Rooted Labeled Trees
New Frontiers in Applied Data Mining
Accurate and Efficient Structural Characteristic Feature Extraction for Clone Detection
FASE '09 Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Data Discovery and Related Factors of Documents on the Web and the Network
ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part I
Constant Factor Approximation of Edit Distance of Bounded Height Unordered Trees
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
The pq-gram distance between ordered labeled trees
ACM Transactions on Database Systems (TODS)
Comparing stars: on approximating graph edit distance
Proceedings of the VLDB Endowment
An efficient unordered tree kernel and its application to glycan classification
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
XML: some papers in a haystack
ACM SIGMOD Record
Scalable and systematic detection of buggy inconsistencies in source code
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A generalized control-flow-aware pattern recognition algorithm for behavioral synthesis
Proceedings of the Conference on Design, Automation and Test in Europe
GRAMS3: an efficient framework for XML structural similarity search
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Approximating Tree Edit Distance through String Edit Distance for Binary Tree Codes
Fundamenta Informaticae
XML structural similarity search using mapreduce
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Approximate joins for XML using g-string
XSym'10 Proceedings of the 7th international XML database conference on Database and XML technologies
XML data clustering: An overview
ACM Computing Surveys (CSUR)
RTED: a robust algorithm for the tree edit distance
Proceedings of the VLDB Endowment
No tag, a little nesting, and great XML keyword search
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Approximate top-k structural similarity search over XML documents
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
KCAM: concentrating on structural similarity for XML fragments
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Process mining by measuring process block similarity
BPM'06 Proceedings of the 2006 international conference on Business Process Management Workshops
Approximating tree edit distance through string edit distance
ISAAC'06 Proceedings of the 17th international conference on Algorithms and Computation
Similarity measure models and algorithms for hierarchical cases
Expert Systems with Applications: An International Journal
Information Systems
RWS-Diff: flexible and efficient change detection in hierarchical data
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Efficient processing of graph similarity queries with edit distance constraints
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Tree-structured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. The generally accepted similarity measure for trees is the edit distance. Although similarity search has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the tree edit distance. In this paper, we propose to transform tree-structured data into an approximate numerical multidimensional vector which encodes the original structure information. We prove that the L1 distance of the corresponding vectors, whose computational complexity is O(|T1| + |T2|), forms a lower bound for the edit distance between trees. Based on the theoretical analysis, we describe a novel algorithm which embeds the proposed distance into a filter-and-refine framework to process similarity search on tree-structured data. The experimental results show that our algorithm reduces dramatically the distance computation cost. Our method is especially suitable for accelerating similarity query processing on large trees in massive datasets.