Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Communication-efficient distributed mining of association rules
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Minimal probing: supporting expensive predicates for top-k queries
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Proceedings of the 17th International Conference on Data Engineering
Efficient Progressive Skyline Computation
Proceedings of the 27th International Conference on Very Large Data Bases
Effect of Data Skewness in Parallel Mining of Association Rules
PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
Adaptive filters for continuous queries over distributed data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Evaluating Top-k Queries over Web-Accessible Databases
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Efficient top-K query calculation in distributed networks
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Diagnosing network-wide traffic anomalies
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Communication Efficient Construction of Decision Trees Over Heterogeneously Distributed Data
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Progressive skyline computation in database systems
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
KLEE: a framework for distributed top-k query algorithms
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Finding global icebergs over distributed data sets
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Communication-efficient distributed monitoring of thresholded counts
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Boolean + ranking: querying a database by k-constrained optimization
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Algorithms and analyses for maximal vector computation
The VLDB Journal — The International Journal on Very Large Data Bases
Finding highly correlated pairs efficiently with powerful pruning
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
International Journal of Hybrid Intelligent Systems
Progressive and selective merge: computing top-k with ad-hoc ranking functions
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Shooting stars in the sky: an online algorithm for skyline queries
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A geometric approach to monitoring threshold functions over distributed data streams
ACM Transactions on Database Systems (TODS)
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Shape sensitive geometric monitoring
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
MINERVA∞: a scalable efficient peer-to-peer search engine
Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
Randomized multi-pass streaming skyline algorithms
Proceedings of the VLDB Endowment
Efficient processing of distributed top-k queries
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Prediction-based geometric monitoring over distributed data streams
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Optimizing data shuffling in data-parallel computation by understanding user-defined functions
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
The goal of a threshold query is to detect all objects whose score exceeds a given threshold. This type of query is used in many settings, such as data mining, event triggering, and top-k selection. Often, threshold queries are performed over distributed data. Given database relations that are distributed over many nodes, an object's score is computed by aggregating the value of each attribute, applying a given scoring function over the aggregation, and thresholding the function's value. However, joining all the distributed relations to a central database might incur prohibitive overheads in bandwidth, CPU, and storage accesses. Efficient algorithms required to reduce these costs exist only for monotonic aggregation threshold queries and certain specific scoring functions. We present a novel approach for efficiently performing general distributed threshold queries. To the best of our knowledge, this is the first solution to the problem of performing such queries with general scoring functions. We first present a solution for monotonic functions, and then introduce a technique to solve for other functions by representing them as a difference of monotonic functions. Experiments with real-world data demonstrate the method's effectiveness in achieving low communication and access costs.