A course in density estimation
A course in density estimation
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A constant-factor approximation algorithm for the k-median problem (extended abstract)
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Sublinear time algorithms for metric space problems
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Indexing large metric spaces for similarity search queries
ACM Transactions on Database Systems (TODS)
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
CLARANS: A Method for Clustering Objects for Spatial Data Mining
IEEE Transactions on Knowledge and Data Engineering
Fast approximations for sums of distances, clustering and the Fermat--Weber problem
Computational Geometry: Theory and Applications
STING: A Statistical Information Grid Approach to Spatial Data Mining
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast probabilistic algorithms for hamiltonian circuits and matchings
STOC '77 Proceedings of the ninth annual ACM symposium on Theory of computing
Query Processing Issues in Image(Multimedia) Databases
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
An ontology model to facilitate knowledge-sharing in multi-agent systems
The Knowledge Engineering Review
Spatially-decaying aggregation over a network: model and algorithms
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A Survey of Outlier Detection Methodologies
Artificial Intelligence Review
Selectivity estimators for multidimensional range queries over real attributes
The VLDB Journal — The International Journal on Very Large Data Bases
An Efficient Approximate Algorithm for the 1-Median Problem in Metric Spaces
SIAM Journal on Optimization
Continuous monitoring of top-k queries over sliding windows
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Extracting redundancy-aware top-k patterns
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Answering top-k queries using views
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Answering top-k queries with multi-dimensional selections: the ranking cube approach
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Spatially-decaying aggregation over a network
Journal of Computer and System Sciences
Efficiently answering top-k typicality queries on large databases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A Formal Model of Ontology for Handling Fuzzy Membership and Typicality of Instances
The Computer Journal
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Efficient processing of exact top-k queries over disk-resident sorted lists
The VLDB Journal — The International Journal on Very Large Data Bases
Providing built-in keyword search capabilities in RDBMS
The VLDB Journal — The International Journal on Very Large Data Bases
Conceptual views for entity-centric search: turning data into meaningful concepts
Computer Science - Research and Development
Answering Typicality Query Based on Automatically Prototype Construction
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Redundancy-aware maximal cliques
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Finding typical instances is an effective approach to understand and analyze large data sets. In this paper, we apply the idea of typicality analysis from psychology and cognitive science to database query answering, and study the novel problem of answering top-k typicality queries. We model typicality in large data sets systematically. Three types of top-k typicality queries are formulated. To answer questions like "Who are the top-k most typical NBA players?", the measure of simple typicality is developed. To answer questions like "Who are the top-k most typical guards distinguishing guards from other players?", the notion of discriminative typicality is proposed. Moreover, to answer questions like "Who are the best k typical guards in whole representing different types of guards?", the notion of representative typicality is used. Computing the exact answer to a top-k typicality query requires quadratic time which is often too costly for online query answering on large databases. We develop a series of approximation methods for various situations: (1) the randomized tournament algorithm has linear complexity though it does not provide a theoretical guarantee on the quality of the answers; (2) the direct local typicality approximation using VP-trees provides an approximation quality guarantee; (3) a local typicality tree data structure can be exploited to index a large set of objects. Then, typicality queries can be answered efficiently with quality guarantees by a tournament method based on a Local Typicality Tree. An extensive performance study using two real data sets and a series of synthetic data sets clearly shows that top-k typicality queries are meaningful and our methods are practical.