Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
A derandomization using min-wise independent permutations
Journal of Discrete Algorithms
A performance study of four index structures for set-valued attributes of low cardinality
The VLDB Journal — The International Journal on Very Large Data Bases
Distinct-value synopses for multiset operations
Communications of the ACM - A View of Parallel Computing
See what's enBlogue: real-time emergent topic identification in social media
Proceedings of the 15th International Conference on Extending Database Technology
Sketch-based querying of distributed sliding-window data streams
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
In this work we consider the continuous computation of set correlations over a stream of set-valued attributes, such as Tweets and their hashtags, social annotations of blog posts obtained through RSS, or updates to set-valued attributes of databases. In order to compute tag correlations in a distributed fashion, all necessary information has to be present at the computing node(s). Our approach makes use of a partitioning scheme based on set covers for efficient and replication-lean information flow. We report on the results of a preliminary performance evaluation using Tweets obtained through Twitter's streaming API.