Optimizing index for taxonomy keyword search

Authors:
Bolin Ding;Haixun Wang;Ruoming Jin;Jiawei Han;Zhongyuan Wang
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL, USA;Microsoft Research Asia, Beijing, China;Kent State University, Kent, OH, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA;Microsoft Research Asia, Beijing, China
Venue:
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Year:
2012

Citing 23
Cited 1

CYC: a large-scale investment in knowledge infrastructure

Communications of the ACM
On the complexity of the view-selection problem

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimal indexing using near-minimal space

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Generating query substitutions

Proceedings of the 15th international conference on World Wide Web
Optimizing scoring functions and indexes for proximity search in type-annotated corpora

Proceedings of the 15th international conference on World Wide Web
Three-Level Caching for Efficient Query Processing in Large Web Search Engines

World Wide Web
Semantic taxonomy induction from heterogenous evidence

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Optimizing relevance and revenue in ad search: a query substitution approach

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Context-aware query suggestion by mining click-through and session data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Query suggestion using hitting time

Proceedings of the 17th ACM conference on Information and knowledge management
Top-k aggregation using intersections of ranked inputs

Proceedings of the Second ACM International Conference on Web Search and Data Mining
A view selection algorithm with performance guarantee

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Similarity measures for short segments of text

ECIR'07 Proceedings of the 29th European conference on IR research
Fast set intersection in memory

Proceedings of the VLDB Endowment
The Design of Approximation Algorithms

The Design of Approximation Algorithms
Efficiently encoding term co-occurrences in inverted indexes

Proceedings of the 20th ACM international conference on Information and knowledge management
An ontology-based information retrieval model

ESWC'05 Proceedings of the Second European conference on The Semantic Web: research and Applications
Probase: a probabilistic taxonomy for text understanding

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Short text conceptualization using a probabilistic knowledgebase

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
A note on maximizing a submodular set function subject to a knapsack constraint

Operations Research Letters

Probase: a probabilistic taxonomy for text understanding

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query substitution is an important problem in information retrieval. Much work focuses on how to find substitutes for any given query. In this paper, we study how to efficiently process a keyword query whose substitutes are defined by a given taxonomy. This problem is challenging because each term in a query can have a large number of substitutes, and the original query can be rewritten into any of their combinations. We propose to build an additional index (besides inverted index) to efficiently process queries. For a query workload, we formulate an optimization problem which chooses the additional index structure, aiming at minimizing the query evaluation cost, under given index space constraints. We show the NP-hardness of the problem, and propose a pseudo-polynomial time algorithm using dynamic programming, as well as an 1 over 4(1-1/e)-approximation algorithm to solve the problem. Experimental results show that, with only 10% additional index space, our approach can greatly reduce the query evaluation cost.