An overview of data warehousing and OLAP technology
ACM SIGMOD Record
Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Empirical bayes screening for multi-item associations
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Recovering Information from Summary Data
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Convex Optimization
Consistently estimating the selectivity of conjuncts of predicates
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Using Datacube Aggregates for Approximate Querying and Deviation Detection
IEEE Transactions on Knowledge and Data Engineering
Automatic construction of multifaceted browsing interfaces
Proceedings of the 14th ACM international conference on Information and knowledge management
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Seeking stable clusters in the blogosphere
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Mining search engine query logs via suggestion sampling
Proceedings of the VLDB Endowment
Multidimensional content eXploration
Proceedings of the VLDB Endowment
FACeTOR: cost-driven exploration of faceted query results
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Query expansion based on clustered results
Proceedings of the VLDB Endowment
ReDRIVE: result-driven database exploration through recommendations
Proceedings of the 20th ACM international conference on Information and knowledge management
Adding structure to top-k: from items to expansions
Proceedings of the 20th ACM international conference on Information and knowledge management
Semantic Query Expansion using Cluster Based Domain Ontologies
International Journal of Information Retrieval Research
YmalDB: a result-driven recommendation system for databases
Proceedings of the 16th International Conference on Extending Database Technology
Summarizing answer graphs induced by keyword queries
Proceedings of the VLDB Endowment
YmalDB: exploring relational databases via result-driven recommendations
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
User generated content has been fueling an explosion in the amount of available textual data. In this context, it is also common for users to express, either explicitly (through numerical ratings) or implicitly, their views and opinions on products, events, etc. This wealth of textual information necessitates the development of novel searching and data exploration paradigms. In this paper we propose a new searching model, similar in spirit to faceted search, that enables the progressive refinement of a keyword-query result. However, in contrast to faceted search which utilizes domain-specific and hard-to-extract document attributes, the refinement process is driven by suggesting interesting expansions of the original query with additional search terms. Our query-driven and domain-neutral approach employs surprising word co-occurrence patterns and (optionally) numerical user ratings in order to identify meaningful top-k query expansions and allow one to focus on a particularly interesting subset of the original result set. The proposed functionality is supported by a framework that is computationally efficient and nimble in terms of storage requirements. Our solution is grounded on Convex Optimization principles that allow us to exploit the pruning opportunities offered by the natural top-k formulation of our problem. The performance benefits offered by our solution are verified using both synthetic data and large real data sets comprised of blog posts.