Efficient keyword search over virtual XML views

Authors:
Feng Shao;Lin Guo;Chavdar Botev;Anand Bhaskar;Muthiah Chettiar;Fan Yang;Jayavel Shanmugasundaram
Affiliations:
Cornell University, Ithaca, NY;Cornell University, Ithaca, NY;Cornell University, Ithaca, NY;Cornell University, Ithaca, NY;Cornell University, Ithaca, NY;Cornell University, Ithaca, NY;Yahoo! Research, Santa Clara, CA
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 33
Cited 9

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Exploring the similarity space

ACM SIGIR Forum
A flexible model for retrieval of SGML documents

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
On wrapping query languages and efficient XML integration

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SilkRoute: trading between relations and XML

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
XRel: a path-based approach to storage and retrieval of XML documents using relational databases

ACM Transactions on Internet Technology (TOIT)
XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Querying XML Views of Relational Data

Proceedings of the 27th International Conference on Very Large Data Bases
TIMBER: A native XML database

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient filtering of XML documents with XPath expressions

The VLDB Journal — The International Journal on Very Large Data Bases
Query processing over object views of relational data

The VLDB Journal — The International Journal on Very Large Data Bases
Querying structured text in an XML database

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Lazy XSL transformations

Proceedings of the 2003 ACM symposium on Document engineering
On the Integration of Structure Indexes and Inverted Lists

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Rank-aware query optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
ORDPATHs: insert-friendly XML node labels

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Optimizing Top-k Selection Queries over Multimedia Repositories

IEEE Transactions on Knowledge and Data Engineering
GalaTex: a conformant implementation of the XQuery full-text language

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Structure and content scoring for XML

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Quark: an efficient XQuery full-text implementation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Index structures for matching XML twigs using relational query processors

Data & Knowledge Engineering
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Projecting XML documents

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
From tree patterns to generalized tree patterns: on efficient evaluation of XQuery

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Retune: Retrieving and Materializing Tuple Units for Effective Keyword Search over Relational Databases

ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Efficient keyword search over virtual XML views

The VLDB Journal — The International Journal on Very Large Data Bases
Keyword search on structured and semi-structured data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents

Information Sciences: an International Journal
Exploit keyword query semantics and structure of data for effective XML keyword search

ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
XML materialized views and schema evolution in VIREX

Information Sciences: an International Journal
Providing built-in keyword search capabilities in RDBMS

The VLDB Journal — The International Journal on Very Large Data Bases
K-graphs: selecting top-k data sources for XML keyword queries

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Exploiting and Maintaining Materialized Views for XML Keyword Queries

ACM Transactions on Internet Technology (TOIT)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Emerging applications such as personalized portals, enterprise search and web integration systems often require keyword search over semi-structured views. However, traditional information retrieval techniques are likely to be expensive in this context because they rely on the assumption that the set of documents being searched is materialized. In this paper, we present a system architecture and algorithm that can efficiently evaluate keyword search queries over virtual (unmaterialized) XML views. An interesting aspect of our approach is that it exploits indices present on the base data and thereby avoids materializing large parts of the view that are not relevant to the query results. Another feature of the algorithm is that by solely using indices, we can still score the results of queries over the virtual view, and the resulting scores are the same as if the view was materialized. Our performance evaluation using the INEX data set in the Quark [5] open-source XML database system indicates that the proposed approach is scalable and efficient.