Measure-driven keyword-query expansion

Authors:
Nikos Sarkas;Nilesh Bansal;Gautam Das;Nick Koudas
Affiliations:
University of Toronto;University of Toronto;University of Texas at Arlington;University of Toronto
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 15
Cited 8

An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Empirical bayes screening for multi-item associations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Recovering Information from Summary Data

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Convex Optimization

Convex Optimization
Consistently estimating the selectivity of conjuncts of predicates

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Using Datacube Aggregates for Approximate Querying and Deviation Detection

IEEE Transactions on Knowledge and Data Engineering
Automatic construction of multifaceted browsing interfaces

Proceedings of the 14th ACM international conference on Information and knowledge management
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Seeking stable clusters in the blogosphere

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Beyond basic faceted search

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Mining search engine query logs via suggestion sampling

Proceedings of the VLDB Endowment
Multidimensional content eXploration

Proceedings of the VLDB Endowment

FACeTOR: cost-driven exploration of faceted query results

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Query expansion based on clustered results

Proceedings of the VLDB Endowment
ReDRIVE: result-driven database exploration through recommendations

Proceedings of the 20th ACM international conference on Information and knowledge management
Adding structure to top-k: from items to expansions

Proceedings of the 20th ACM international conference on Information and knowledge management
Semantic Query Expansion using Cluster Based Domain Ontologies

International Journal of Information Retrieval Research
YmalDB: a result-driven recommendation system for databases

Proceedings of the 16th International Conference on Extending Database Technology
Summarizing answer graphs induced by keyword queries

Proceedings of the VLDB Endowment
YmalDB: exploring relational databases via result-driven recommendations

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

User generated content has been fueling an explosion in the amount of available textual data. In this context, it is also common for users to express, either explicitly (through numerical ratings) or implicitly, their views and opinions on products, events, etc. This wealth of textual information necessitates the development of novel searching and data exploration paradigms. In this paper we propose a new searching model, similar in spirit to faceted search, that enables the progressive refinement of a keyword-query result. However, in contrast to faceted search which utilizes domain-specific and hard-to-extract document attributes, the refinement process is driven by suggesting interesting expansions of the original query with additional search terms. Our query-driven and domain-neutral approach employs surprising word co-occurrence patterns and (optionally) numerical user ratings in order to identify meaningful top-k query expansions and allow one to focus on a particularly interesting subset of the original result set. The proposed functionality is supported by a framework that is computationally efficient and nimble in terms of storage requirements. Our solution is grounded on Convex Optimization principles that allow us to exploit the pruning opportunities offered by the natural top-k formulation of our problem. The performance benefits offered by our solution are verified using both synthetic data and large real data sets comprised of blog posts.