Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing complex aggregate queries over data streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Querying and mining data streams: you only get one look a tutorial
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
An Approximate L1-Difference Algorithm for Massive Data Streams
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Medians and beyond: new aggregation techniques for sensor networks
SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Simpler algorithm for estimating frequency moments of data streams
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Working Models for Uncertain Data
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Towards correcting input data errors probabilistically using integrity constraints
MobiDE '06 Proceedings of the 5th ACM international workshop on Data engineering for wireless and mobile access
Space- and time-efficient deterministic algorithms for biased quantiles over data streams
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
ULDBs: databases with uncertainty and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Estimating statistical aggregates on probabilistic data streams
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient aggregation algorithms for probabilistic data
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient query evaluation on probabilistic databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient computation of frequent and top-k elements in data streams
ICDT'05 Proceedings of the 10th international conference on Database Theory
Estimating statistical aggregates on probabilistic data streams
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Event queries on correlated probabilistic streams
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Finding frequent items in probabilistic data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Approximation algorithms for clustering uncertain data
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Cascadia: A System for Specifying, Detecting, and Managing RFID Events
Proceedings of the 6th international conference on Mobile systems, applications, and services
Estimating statistical aggregates on probabilistic data streams
ACM Transactions on Database Systems (TODS)
Sliding-window top-k queries on uncertain streams
Proceedings of the VLDB Endowment
Top-k dominating queries in uncertain databases
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
PROUD: a probabilistic approach to processing similarity queries over uncertain data streams
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Efficiently Clustering Probabilistic Data Streams
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
A Sliding-Window Approach for Finding Top-k Frequent Itemsets from Uncertain Streams
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Learning from Data Streams: Synopsis and Change Detection
Proceedings of the 2008 conference on STAIRS 2008: Proceedings of the Fourth Starting AI Researchers' Symposium
Continuously monitoring top-k uncertain data streams: a probabilistic threshold method
Distributed and Parallel Databases
Probabilistic histograms for probabilistic data
Proceedings of the VLDB Endowment
PODS: a new model and processing algorithms for uncertain data streams
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Sliding-window top-k queries on uncertain streams
The VLDB Journal — The International Journal on Very Large Data Bases
Mining uncertain data with probabilistic guarantees
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerating probabilistic frequent itemset mining: a model-based approach
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems
Conditioning and aggregating uncertain data streams: going beyond expectations
Proceedings of the VLDB Endowment
Distributed frequent items detection on uncertain data
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Handling ER-topk query on uncertain streams
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Continuous inverse ranking queries in uncertain streams
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Efficiently answering probability threshold-based shortest path queries over uncertain graphs
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
SIC-means: a semi-fuzzy approach for clustering data streams using c-means
ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Efficient trade-off between speed processing and accuracy in summarizing data streams
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Monitoring incremental histogram distribution for change detection in data streams
Sensor-KDD'08 Proceedings of the Second international conference on Knowledge Discovery from Sensor Data
Incremental update on probabilistic frequent itemsets in uncertain databases
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Space-efficient estimation of statistics over sub-sampled streams
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
An embedded co-processor for accelerating window joins over uncertain data streams
Microprocessors & Microsystems
CLARO: modeling and processing uncertain data streams
The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic top-k dominating queries in uncertain databases
Information Sciences: an International Journal
Mining frequent subgraphs over uncertain graph databases under probabilistic semantics
The VLDB Journal — The International Journal on Very Large Data Bases
A framework for distributed managing uncertain data in RFID traceability networks
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
FGIT'12 Proceedings of the 4th international conference on Future Generation Information Technology
Probabilistic inference of object identifications for event stream analytics
Proceedings of the 16th International Conference on Extending Database Technology
FARP: Mining fuzzy association rules from a probabilistic quantitative database
Information Sciences: an International Journal
Probabilistic k-skyband operator over sliding windows
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Probabilistic skyline operator over sliding windows
Information Systems
Hi-index | 0.00 |
The management of uncertain, probabilistic data has recently emerged as a useful paradigm for dealing with the inherent unreliabilities of several real-world application domains, including data cleaning, information integration, and pervasive, multi-sensor computing. Unlike conventional data sets, a set of probabilistic tuples defines a probability distribution over an exponential number of possible worlds (i.e., "grounded", deterministic databases). This "possibleworlds" interpretation allows for clean query semantics but also raises hard computational problems for probabilistic database query processors. To further complicate matters, in many scenarios (e.g., large-scale process and environmental monitoring using multiple sensor modalities), probabilistic data tuples arrive and need to be processed in a streaming fashion; that is, using limited memory and CPU resources and without the benefit of multiple passes over a static probabilistic database. Such probabilistic data streams raise a host of new research challenges for stream-processing engines that, to date, remain largely unaddressed. In this paper, we propose the first space- and time-efficient algorithms for approximating complex aggregate queries (including, the number of distinct values and join/self-join sizes) over probabilistic data streams. Following the possible-worlds semantics, such aggregates essentially define probability distributions over the space of possible aggregation results, and our goal is to characterize such distributions through efficient approximations of their key moments (such as expectation and variance). Our algorithms offer strong randomized estimation guarantees while using only sublinear space in the size of the stream(s), and rely on novel, concise streaming sketch synopses that extend conventional sketching ideas to the probabilistic streams setting. Our experimental results verify the effectiveness of our approach.