SUSAX: Context-specific searching in XML documents using sequence alignment techniques

Authors:
Kajal T. Claypool
Affiliations:
Oracle Corporation, Nashua, NH, United States
Venue:
Data & Knowledge Engineering
Year:
2008

Citing 11
Cited 0

Automated resolution of semantic heterogeneity in multidatabases

ACM Transactions on Database Systems (TODS)
Semantic integration of heterogeneous information sources

Data & Knowledge Engineering - Special issue on heterogeneous information resources need semantic access
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Placing search in context: the concept revisited

ACM Transactions on Information Systems (TOIS)
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Using Schema Matching to Simplify Heterogeneous Data Translation

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
Semantic Integration in Heterogeneous Databases Using Neural Networks

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
QMatch - Using paths to match XML schemas

Data & Knowledge Engineering
WordNet: similarity - measuring the relatedness of concepts

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyword searching while very successful in narrowing down the contents of the Web to the pertaining subset of information, has two primary drawbacks. First, the accuracy of the search is closely coupled with the choice of keywords. Second, keywords are limited in their expressibility. In particular, they fail to adequately capture the contextual information implicit in most searches done by users. In this paper we present an approach to efficiently address these drawbacks of keyword searching over XML documents. In particular, we present SUSAX a system for approximate contextual querying over XML documents wherein queries are represented as simple XPaths. A key contribution of our work is the novel algorithm used to match the XPath-like query with similar paths in the repository. The algorithm is based on sequence alignment algorithms prevalent in life sciences domain for discovering the similarity between genome and protein sequences. In this paper, we show an adaptation of the sequence alignment algorithm for now discovering and cataloging the similarity between two paths.