Efficiently answering top-k typicality queries on large databases

Authors:
Ming Hua;Jian Pei;Ada W. C. Fu;Xuemin Lin;Ho-Fung Leung
Affiliations:
Simon Fraser University, Canada;Simon Fraser University, Canada;The Chinese University of Hong Kong, China;The University of New South Wales & NICTA, Australia;The Chinese University of Hong Kong, China
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 17
Cited 17

A course in density estimation

A course in density estimation
Sublinear time algorithms for metric space problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast approximations for sums of distances, clustering and the Fermat--Weber problem

Computational Geometry: Theory and Applications
Fast probabilistic algorithms for hamiltonian circuits and matchings

STOC '77 Proceedings of the ninth annual ACM symposium on Theory of computing
An ontology model to facilitate knowledge-sharing in multi-agent systems

The Knowledge Engineering Review
Spatially-decaying aggregation over a network: model and algorithms

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Selectivity estimators for multidimensional range queries over real attributes

The VLDB Journal — The International Journal on Very Large Data Bases
An Efficient Approximate Algorithm for the 1-Median Problem in Metric Spaces

SIAM Journal on Optimization
Continuous monitoring of top-k queries over sliding windows

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Extracting redundancy-aware top-k patterns

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Formalizing typicality of objects and context-sensitivity in ontologies

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Answering top-k queries using views

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Answering top-k queries with multi-dimensional selections: the ranking cube approach

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Spatially-decaying aggregation over a network

Journal of Computer and System Sciences
Ontology with likeliness and typicality of objects in concepts

ER'06 Proceedings of the 25th international conference on Conceptual Modeling

Probabilistic ranked queries in uncertain databases

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
ARCube: supporting ranking aggregate queries in partially materialized data cubes

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Tighter estimation using bottom k sketches

Proceedings of the VLDB Endowment
Sliding-window top-k queries on uncertain streams

Proceedings of the VLDB Endowment
Retune: Retrieving and Materializing Tuple Units for Effective Keyword Search over Relational Databases

ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Top-k typicality queries and efficient query answering methods on large databases

The VLDB Journal — The International Journal on Very Large Data Bases
Robust and efficient algorithms for rank join evaluation

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents

Information Sciences: an International Journal
Unsupervised image ranking

LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Accessible image search

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Using trees to depict a forest

Proceedings of the VLDB Endowment
Splash: ad-hoc querying of data and statistical models

Proceedings of the 13th International Conference on Extending Database Technology
Sliding-window top-k queries on uncertain streams

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient processing of exact top-k queries over disk-resident sorted lists

The VLDB Journal — The International Journal on Very Large Data Bases
Accessible image search for colorblindness

ACM Transactions on Intelligent Systems and Technology (TIST)
Efficient top-k retrieval for user preference queries

Proceedings of the 2011 ACM Symposium on Applied Computing
Answering Typicality Query Based on Automatically Prototype Construction

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding typical instances is an effective approach to understand and analyze large data sets. In this paper, we apply the idea of typicality analysis from psychology and cognition science to database query answering, and study the novel problem of answering top-k typicality queries. We model typicality in large data sets systematically. To answer questions like "Who are the top-k most typical NBA players?", the measure of simple typicality is developed. To answer questions like "Who are the top-k most typical guards distinguishing guards from other players?", the notion of discriminative typicality is proposed. Computing the exact answer to a top-k typicality query requires quadratic time which is often too costly for online query answering on large databases. We develop a series of approximation methods for various situations. (1) The randomized tournament algorithm has linear complexity though it does not provide a theoretical guarantee on the quality of the answers. (2) The direct local typicality approximation using VP-trees provides an approximation quality guarantee. (3) A VP-tree can be exploited to index a large set of objects. Then, typicality queries can be answered efficiently with quality guarantees by a tournament method based on a Local Typicality Tree data structure. An extensive performance study using two real data sets and a series of synthetic data sets clearly show that top-k typicality queries are meaningful and our methods are practical.