Skyline queries in crowd-enabled databases

Authors:
Christoph Lofi;Kinda El Maarry;Wolf-Tilo Balke
Affiliations:
National Institute of Informatics Tokyo, Japan;Technische Universität Braunschweig, Braunschweig, Germany;Technische Universität Braunschweig, Braunschweig, Germany
Venue:
Proceedings of the 16th International Conference on Extending Database Technology
Year:
2013

Citing 19
Cited 0

Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The Skyline Operator

Proceedings of the 17th International Conference on Data Engineering
Progressive skyline computation in database systems

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
MYSTIQ: a system for finding more answers by using probabilities

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Algorithms and analyses for maximal vector computation

The VLDB Journal — The International Journal on Very Large Data Bases
Handling Missing Values when Applying Classification Models

The Journal of Machine Learning Research
Probabilistic skylines on uncertain data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient sort-based skyline evaluation

ACM Transactions on Database Systems (TODS)
Fast and Simple Relational Processing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Database Support for Probabilistic Attributes and Tuples

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Skyline Query Processing for Incomplete Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Computing all skyline probabilities for uncertain data

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Probabilistic skyline queries

Proceedings of the 18th ACM conference on Information and knowledge management
Ranking uncertain sky: The probabilistic top-k skyline operator

Information Systems
CrowdDB: answering queries with crowdsourcing

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Human-powered sorts and joins

Proceedings of the VLDB Endowment
Highly scalable multiprocessing algorithms for preference-based database retrieval

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Pushing the boundaries of crowd-enabled databases with query-driven schema expansion

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Skyline queries are a well-established technique for database query personalization and are widely acclaimed for their intuitive query formulation mechanisms. However, when operating on incomplete datasets, skylines queries are severely hampered and often have to resort to highly error-prone heuristics. Unfortunately, incomplete datasets are a frequent phenomenon, especially when datasets are generated automatically using various information extraction or information integration approaches. Here, the recent trend of crowd-enabled databases promises a powerful solution: during query execution, some database operators can be dynamically outsourced to human workers in exchange for monetary compensation, therefore enabling the elicitation of missing values during runtime. Unfortunately, this powerful feature heavily impacts query response times and (monetary) execution costs. In this paper, we present an innovative hybrid approach combining dynamic crowd-sourcing with heuristic techniques in order to overcome current limitations. We will show that by assessing the individual risk a tuple poses with respect to the overall result quality, crowd-sourcing efforts for eliciting missing values can be narrowly focused on only those tuples that may degenerate the expected quality most strongly. This leads to an algorithm for computing skyline sets on incomplete data with maximum result quality, while optimizing crowd-sourcing costs.