Locating and ranking XML documents based on content and structure synopses

Authors:
Weimin He;Leonidas Fegaras;David Levine
Affiliations:
University of Texas at Arlington, CSE, Arlington, TX;University of Texas at Arlington, CSE, Arlington, TX;University of Texas at Arlington, CSE, Arlington, TX
Venue:
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Year:
2007

Citing 7
Cited 0

Searching XML documents via XML fragments

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Querying structured text in an XML database

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Controlling overlap in content-oriented XML retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Structure and content scoring for XML

VLDB '05 Proceedings of the 31st international conference on Very large data bases
XCluster Synopses for Structured XML Content

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Updates for structure indexes

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new framework for indexing, locating and ranking XML documents based on content and structural synopses extracted from the documents. Instead of indexing each single element or term in a document, we extract a structural summary and a small number of data synopses from the document, which are indexed in an efficient way suitable for query evaluation. Our query language is XPath extended with full-text search. The result of query evaluation is a ranked list of document locations that best match the query. We propose a novel aggregated ranking scheme, which is integrated into the query evaluation to score the documents based on those data synopses. Our experimental evaluation shows that our indexing scheme outperforms the standard XML indexing scheme based on inverted lists and our ranking scheme is effective in terms of precision and recall.