Efficient query processing for XML keyword queries based on the IDList index

Authors:
Junfeng Zhou;Zhifeng Bao;Wei Wang;Jinjia Zhao;Xiaofeng Meng
Affiliations:
The Key Laboratory for Computer Virtual Technology and System Integration of HeBei Province, School of Information Science and Engineering, Yanshan University, Qinhuangdao, China;Interactive Digital Media Institute, Singapore, Singapore;The University of New South Wales, Kensington, NSW, Australia;The Key Laboratory for Computer Virtual Technology and System Integration of HeBei Province, School of Information Science and Engineering, Yanshan University, Qinhuangdao, China;Renmin University of China, Beijing, China
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2014

Citing 29
Cited 0

Adaptive set intersections, unions, and differences

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Storing and querying ordered XML using a relational database system

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Experiments on Adaptive Set Intersections for Text Retrieval Systems

ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient ordering for XML data

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Multiway SLCA-based keyword search in XML data

Proceedings of the 16th international conference on World Wide Web
Identifying meaningful return information for XML keyword search

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Lazy, adaptive rid-list intersection, and its application to index anding

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Schema-free XQuery

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Effective keyword search for valuable lcas over xml documents

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Efficient LCA based keyword search in XML data

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Introduction to Information Retrieval

Introduction to Information Retrieval
Reasoning and identifying relevant matches for XML keyword search

Proceedings of the VLDB Endowment
Retrieving meaningful relaxed tightest fragments for XML keyword search

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Inverted index compression and query processing with optimized document ordering

Proceedings of the 18th international conference on World wide web
Hash-Search: An Efficient SLCA-Based Keyword Search Algorithm on XML Documents

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Effective XML Keyword Search with Relevance Oriented Ranking

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Efficient type-ahead search on relational data: a TASTIER approach

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Keyword search on structured and semi-structured data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Improving the performance of list intersection

Proceedings of the VLDB Endowment
Fast ELCA computation for keyword queries on XML data

Proceedings of the 13th International Conference on Extending Database Technology
Fast set intersection in memory

Proceedings of the VLDB Endowment
Keyword-based search and exploration on databases

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Processing keyword search on XML: a survey

World Wide Web
Faster adaptive set intersections for text searching

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Fast SLCA and ELCA Computation for XML Keyword Queries Based on Set Intersection

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyword search over XML data has attracted a lot of research efforts in the last decade, where one of the fundamental research problems is how to efficiently answer a given keyword query w.r.t. a certain query semantics. We found that the key factor resulting in the inefficiency for existing methods is that they all heavily suffer from the common-ancestor-repetition problem. In this paper, we propose a novel form of inverted list, namely the IDList; the IDList for keyword $$k$$ consists of ordered nodes that directly or indirectly contain $$k$$. We then show that finding keyword query results based on the smallest lowest common ancestor and exclusive lowest common ancestor semantics can be reduced to ordered set intersection problem, which has been heavily optimized due to its application in areas such as information retrieval and database systems. We propose several algorithms that exploit set intersection in different directions and with or without using additional indexes. We further propose several algorithms that are based on hash search to simplify the operation of finding common nodes from all involved IDLists. We have conducted an extensive set of experiments using many state-of-the-art algorithms and several large-scale datasets. The results demonstrate that our proposed methods outperform existing methods by up to two orders of magnitude in many cases.