Efficient LCA based keyword search in XML data

Authors:
Yu Xu;Yannis Papakonstantinou
Affiliations:
Teradata, San Diego, CA;University of California, San Diego, CA
Venue:
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Year:
2008

Citing 17
Cited 37

XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Database System Implementation

Database System Implementation
The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Querying Semistructured Heterogeneous Information

DOOD '95 Proceedings of the Fourth International Conference on Deductive and Object-Oriented Databases
Querying XML Documents Made Easy: Nearest Concept Queries

Proceedings of the 17th International Conference on Data Engineering
Proximity Search in Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Adding Relevance to XML

Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Texquery: a full-text search extension to xquery

Proceedings of the 13th international conference on World Wide Web
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Bidirectional expansion for keyword search on graph databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Multiway SLCA-based keyword search in XML data

Proceedings of the 16th international conference on World Wide Web
Identifying meaningful return information for XML keyword search

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Schema-free XQuery

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Retrieving meaningful relaxed tightest fragments for XML keyword search

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Retrieving valid matches for XML keyword search

Proceedings of the 2009 ACM symposium on Applied Computing
Efficient Data Structure for XML Keyword Search

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Efficient type-ahead search on relational data: a TASTIER approach

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Keyword search on structured and semi-structured data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents

Information Sciences: an International Journal
Efficient keyword proximity search using a frontier-reduce strategy based on d-distance graph index

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Finding and ranking compact connected trees for effective keyword proximity search in XML documents

Information Systems
Return specification inference and result clustering for keyword search on XML

ACM Transactions on Database Systems (TODS)
Fast ELCA computation for keyword queries on XML data

Proceedings of the 13th International Conference on Extending Database Technology
Suggestion of promising result types for XML keyword search

Proceedings of the 13th International Conference on Extending Database Technology
LCA-based selection for XML document collections

Proceedings of the 19th international conference on World wide web
Structural consistency: enabling XML keyword search to eliminate spurious results consistently

The VLDB Journal — The International Journal on Very Large Data Bases
Exploit keyword query semantics and structure of data for effective XML keyword search

ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
An effective 3-in-1 keyword search method over heterogeneous data sources

Information Systems
Using structural information in XML keyword search effectively

ACM Transactions on Database Systems (TODS)
Providing built-in keyword search capabilities in RDBMS

The VLDB Journal — The International Journal on Very Large Data Bases
Relevant answers for XML keyword search: a skyline approach

WISE'10 Proceedings of the 11th international conference on Web information systems engineering
A survey on XML keyword search

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Identifying relevant matches with NOT semantics over XML documents

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Improving the performance of identifying contributors for XML keyword search

ACM SIGMOD Record
Adaptive and effective keyword search for XML

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Efficient fuzzy full-text type-ahead search

The VLDB Journal — The International Journal on Very Large Data Bases
K-graphs: selecting top-k data sources for XML keyword queries

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Processing keyword search on XML: a survey

World Wide Web
Organizational search in email systems

Proceedings of the 50th Annual Southeast Regional Conference
Top-K data source selection for keyword queries over multiple XML data sources

Journal of Information Science
Efficient keyword search on large tree structured datasets

KEYS '12 Proceedings of the Third International Workshop on Keyword Search on Structured Data
Fast result enumeration for keyword queries on XML data

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Efficiently identifying contributors for XML keyword search

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Exploiting and Maintaining Materialized Views for XML Keyword Queries

ACM Transactions on Internet Technology (TOIT)
An extended compact TVP index for finding top-k nearest neighbors over XML data tree

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
ELCA evaluation for keyword search on probabilistic XML data

World Wide Web
Top-down keyword query processing on XML data

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Effectively return query results for keyword search on XML data

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Efficient query processing for XML keyword queries based on the IDList index

The VLDB Journal — The International Journal on Very Large Data Bases
XML keyword search with promising result type recommendations

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyword search in XML documents based on the notion of lowest common ancestors (LCAs) and modifications of it has recently gained research interest [10, 14, 22]. In this paper we propose an efficient algorithm called Indexed Stack to find answers to keyword queries based on XRank's semantics to LCA [10]. The complexity of the Indexed Stack algorithm is O(kd|S1| log |S|) where k is the number of keywords in the query, d is the depth of the tree and |S1| (|S|) is the occurrence of the least (most) frequent keyword in the query. In comparison, the best worst case complexity of the core algorithms in [10] is O(kd|S|). We analytically and experimentally evaluate the Indexed Stack algorithm and the two core algorithms in [10]. The results show that the Indexed Stack algorithm outperforms in terms of both CPU and I/O costs other algorithms by orders of magnitude when the query contains at least one low frequency keyword along with high frequency keywords. This is important in practice since the frequencies of keywords typically vary significantly.