Structural matching and discovery in document databases

Authors:
Jason Tsong-Li Wang;Dennis Shasha;George J. S. Chang;Liam Relihan;Kaizhong Zhang;Girish Patel
Affiliations:
Computer and Information Science, New Jersey Institute of Technology;Courant Institute, New York University;Computer and Information Science, New Jersey Institute of Technology;Piercom Ltd., Inter. Business Center, National Tech. Park, Limerick, Ireland;Computer Science Department, Univ. of Western Ontario, Canada;Computer and Information Science, New Jersey Institute of Technology
Venue:
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Year:
1997

Citing 10
Cited 7

RCS—a system for version control

Software—Practice & Experience
Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Fast algorithms for the unit cost editing distance between trees

Journal of Algorithms
From structured documents to novel query facilities

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Pattern matching and pattern discovery in scientific, program, and document databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Change detection in hierarchically structured information

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Accessing relational databases from the World Wide Web

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
HyperStorM—administering structured documents using object-oriented database technology

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A System for Approximate Tree Matching

IEEE Transactions on Knowledge and Data Engineering
Tracking and viewing changes on the web

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

A vision for management of complex models

ACM SIGMOD Record
Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fuzzy Logic Techniques in Multimedia Database Querying: A Preliminary Investigation of the Potentials

IEEE Transactions on Knowledge and Data Engineering
Comparing Hierarchical Data in External Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient and effective web change detection

Data & Knowledge Engineering
An Efficient Algorithm to Compute Differences between Structured Documents

IEEE Transactions on Knowledge and Data Engineering
Application of tree mining to matching of knowledge structures of decision tree type

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Structural matching and discovery in documents such as SGML and HTML is important for data warehousing [6], version management [7, 11], hypertext authoring, digital libraries [4] and Internet databases. As an example, a user of the World Wide Web may be interested in knowing changes in an HTML document [2, 5, 10]. Such changes can be detected by comparing the old and new version of the document (referred to as structural matching of documents). As another example, in hypertext authoring, a user may wish to find the common portions in the history list of a document or in a database of documents (referred to as structural discovery of documents). In SIGMOD 95 demo sessions, we exhibited a software package, called TreeDiff [13], for comparing two latex documents and showing their differences. Given two documents, the tool represents the documents as ordered labeled trees and finds an optimal sequence of edit operations to transform one document (tree) to the other. An edit operation could be an insert, delete, or change of a node in the trees. The tool is so named because documents are represented and compared using approximate tree matching techniques [9, 12, 14].