New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient computation of Iceberg cubes with complex measures
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Estimating simple functions on the union of data streams
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
New directions in traffic measurement and accounting
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
Maintaining time-decaying stream aggregates
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Power-conserving computation of order-statistics over sensor networks
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Autograph: toward automated, distributed worm signature detection
SSYM'04 Proceedings of the 13th conference on USENIX Security Symposium - Volume 13
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Holistic aggregates in a networked world: distributed tracking of approximate quantiles
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Tributaries and deltas: efficient and robust aggregation in sensor network streams
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Sketching streams through the net: distributed approximate query tracking
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Maintaining significant stream statistics over sliding windows
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
INSIGHT: a distributed monitoring system for tracking continuous queries
Proceedings of the twentieth ACM symposium on Operating systems principles
Evaluating the intrinsic dimension of evolving data streams
Proceedings of the 2006 ACM symposium on Applied computing
A simpler and more efficient deterministic scheme for finding frequent items over sliding windows
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding global icebergs over distributed data sets
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A geometric approach to monitoring threshold functions over distributed data streams
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Sketching asynchronous streams over a sliding window
Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Supporting dynamic migration in tightly coupled grid applications
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Finding hierarchical heavy hitters in network measurement system
Proceedings of the 2007 ACM symposium on Applied computing
Streaming in a connected world: querying and tracking distributed data streams
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Cloud control with distributed rate limiting
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
A geometric approach to monitoring threshold functions over distributed data streams
ACM Transactions on Database Systems (TODS)
Finding hierarchical heavy hitters in streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
STAR: self-tuning aggregation for scalable monitoring
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Approximate continuous querying over distributed streams
ACM Transactions on Database Systems (TODS)
Finding frequent items in probabilistic data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Time-decaying aggregates in out-of-order streams
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Shape sensitive geometric monitoring
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A survey on algorithms for mining frequent itemsets over data streams
Knowledge and Information Systems
Short communication: TOPSIS: Finding Top-K significant N-itemsets in sliding windows adaptively
Knowledge-Based Systems
FIDS: Monitoring Frequent Items over Distributed Data Streams
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Computing Frequent Elements Using Gossip
SIROCCO '08 Proceedings of the 15th international colloquium on Structural Information and Communication Complexity
Proceedings of the VLDB Endowment
Making filters smart in distributed data stream environments
Information Sciences: an International Journal
Optimized union of non-disjoint distributed data sets
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Finding the K highest-ranked answers in a distributed network
Computer Networks: The International Journal of Computer and Telecommunications Networking
Measuring evolving data streams' behavior through their intrinsic dimension
New Generation Computing
Resilient workload manager: taming bursty workload of scaling internet applications
ICAC-INDST '09 Proceedings of the 6th international conference industry session on Autonomic computing and communications industry session
Optimal tracking of distributed heavy hitters and quantiles
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Ranking distributed probabilistic data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Competitive Analysis of Aggregate Max in Windowed Streaming
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Thread cooperation in multicore architectures for frequency counting over multiple data streams
Proceedings of the VLDB Endowment
A deterministic algorithm for summarizing asynchronous streams over a sliding window
STACS'07 Proceedings of the 24th annual conference on Theoretical aspects of computer science
A meta-index for querying distributed moving object database servers
Information Systems
Aggregate computation over data streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Mining recent approximate frequent items in wireless sensor networks
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 2
Optimal sampling from distributed streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Load-balanced query dissemination in privacy-aware online communities
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Network imprecision: a new consistency metric for scalable monitoring
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Identifying frequent items in a network using gossip
Journal of Parallel and Distributed Computing
Supporting self-adaptation in streaming data mining applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Uncovering Global Icebergs in Distributed Streams: Results and Implications
Journal of Network and Systems Management
A geometric approach to monitoring threshold functions over distributed data streams
Ubiquitous knowledge discovery
A geometric approach to monitoring threshold functions over distributed data streams
Ubiquitous knowledge discovery
Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis
Data Mining and Knowledge Discovery
CLAP: Collaborative pattern mining for distributed information systems
Decision Support Systems
Optimal random sampling from distributed streams revisited
DISC'11 Proceedings of the 25th international conference on Distributed computing
Lower bounds for number-in-hand multiparty communication complexity, made easy
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Rule synthesizing from multiple related databases
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Searching moving objects in a spatio-temporal distributed database servers system
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part II
Continuous sampling from distributed streams
Journal of the ACM (JACM)
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Randomized algorithms for tracking distributed count, frequencies, and ranks
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Tight bounds for distributed functional monitoring
STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Continuous kernel-based outlier detection over distributed data streams
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Continuous adaptive outlier detection on distributed data streams
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
ProFID: Practical frequent items discovery in peer-to-peer networks
Future Generation Computer Systems
ACM Transactions on Database Systems (TODS) - Invited papers issue
Sketch-based geometric monitoring of distributed stream queries
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
We consider the problem of maintaining frequency counts for items occurring frequently in the union of multiple distributed data streams. Na篓ýve methods of combining approximate frequency counts from multiple nodes tend to result in excessively large data structures that are costly to transfer among nodes. To minimize communication requirements, the degree of precision maintained by each node while counting item frequencies must be managed carefully. We introduce the concept of a precision gradient for managing precision when nodes are arranged in a hierarchical communication structure. We then study the optimization problem of how to set the precision gradient so as to minimize communication, and provide optimal solutions that minimize worst-case communication load over all possible inputs. We then introduce a variant designed to perform well in practice, with input data that does not conform to worst-case characteristics. We verify the effectiveness of our approach empirically using real-world data, and show that our methods incur substantially less communication than na篓ýve approaches while providing the same error guarantees on answers.