An Algorithm for Finding the Largest Approximately Common Substructures of Two Trees

Authors:
Jason T. L. Wang;Bruce A. Shapiro;Dennis Shasha;Kaizhong Zhang;Kathleen M. Currey
Affiliations:
New Jersey Institute of Technology, Newark;National Cancer Institute, Frederick, MD;New York Univ., New York, NY;Univ. of Western Ontario, London, Ont., Canada;National Cancer Institute, Frederick, MD/ and Univ. of Maryland Medical Center, Baltimore
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
1998

Citing 2
Cited 31

A Metric Between Unrooted and Unordered Trees and its Bottom-Up Computing Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Tree-to-Tree Correction Problem

Journal of the ACM (JACM)

Approximately common patterns in shared-forests

Proceedings of the tenth international conference on Information and knowledge management
On Median Graphs: Properties, Algorithms, and Applications

IEEE Transactions on Pattern Analysis and Machine Intelligence - Graph Algorithms and Computer Vision
Detecting and Representing Relevant Web Deltas in WHOWEDA

IEEE Transactions on Knowledge and Data Engineering
String Edit Distance, Random Walks and Graph Matching

Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Constraint-driven join processing in a web warehouse

Data & Knowledge Engineering
Scale-space representation of 3D models and topological matching

SM '03 Proceedings of the eighth ACM symposium on Solid modeling and applications
Local Similarity in RNA Secondary Structures

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Context-oriented programming

Proceedings of the 3rd ACM international workshop on Data engineering for wireless and mobile access
Mining protein family specific residue packing patterns from protein structure graphs

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Automatic web news extraction using tree edit distance

Proceedings of the 13th international conference on World Wide Web
Web data extraction based on partial tree alignment

WWW '05 Proceedings of the 14th international conference on World Wide Web
Local similarity between quotiented ordered trees

Journal of Discrete Algorithms
Object recognition using wavelets, L-G graphs and synthesis of regions

Pattern Recognition
Fast detection of common sequence structure patterns in RNAs

Journal of Discrete Algorithms
Distance Patterns in Structural Similarity

The Journal of Machine Learning Research
Discovering Frequent Agreement Subtrees from Phylogenetic Data

IEEE Transactions on Knowledge and Data Engineering
Research Article: Detecting conserved secondary structures in RNA molecules using constrained structural alignment

Computational Biology and Chemistry
Ekfrasis: A Formal Language for Representing and Generating Sequences of Facial Patterns for Studying Emotional Behavior

Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction
Design of an RNA structural motif database

International Journal of Computational Intelligence in Bioinformatics and Systems Biology
Graph matching using spectral seriation and string edit distance

GbRPR'03 Proceedings of the 4th IAPR international conference on Graph based representations in pattern recognition
Toward an integrated RNA motif database

DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Application of tree mining to matching of knowledge structures of decision tree type

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II
Local alignment of RNA sequences with arbitrary scoring schemes

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
SIRIUS: a lightweight XML indexing and approximate search system at INEX 2005

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Approximate common structures in XML schema matching

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Algorithms for local forest similarity

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
An experiment on the matching and reuse of XML schemas

ICWE'05 Proceedings of the 5th international conference on Web Engineering
Biomonitoring, phylogenetics and anomaly aggregation systems

ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Information retrieval of sequential data in heterogeneous XML databases

AMR'05 Proceedings of the Third international conference on Adaptive Multimedia Retrieval: user, context, and feedback
Validating web content with senser

Proceedings of the 29th Annual Computer Security Applications Conference
Algorithms for local similarity between forests

Journal of Combinatorial Optimization

Quantified Score

Hi-index	0.14

Visualization

Abstract

Ordered, labeled trees are trees in which each node has a label and the left-to-right order of its children (if it has any) is fixed. Such trees have many applications in vision, pattern recognition, molecular biology and natural language processing. We consider a substructure of an ordered labeled tree T to be a connected subgraph of T. Given two ordered labeled trees T1 and T2 and an integer d, the largest approximately common substructure problem is to find a substructure U1 of T1 and a substructure U2 of T2 such that U1 is within edit distance d of U2 and where there does not exist any other substructure V1 of T1 and V2 of T2 such that V1 and V2 satisfy the distance constraint and the sum of the sizes of V1 and V2 is greater than the sum of the sizes of U1 and U2. We present a dynamic programming algorithm to solve this problem, which runs as fast as the fastest known algorithm for computing the edit distance of two trees when the distance allowed in the common substructures is a constant independent of the input trees. To demonstrate the utility of our algorithm, we discuss its application to discovering motifs in multiple RNA secondary structures (which are ordered labeled trees).