Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Dynamically maintaining frequent items over a data stream
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
ACM SIGMOD Record
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Data Mining and Knowledge Discovery
Hybrid in-memory and on-disk tables for speeding-up table accesses
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Hi-index | 0.00 |
In this paper, we introduce a novel data structure, ESBF (Ex- tensible and Scalable Bloom Filter), and the algorithm FI-ESBF (Finding frequent Items using ESBF) for estimating the frequent items in data streams. FI-ESBF can work with high precision while using much less memory than those of the best reported algorithm does considering the large number of distinct items in the stream. ESBF is the extension of counting Bloom Filter(CBF), By using it, we are allowed to adjust the size of memory used dynamically according to the different data distribution and the number of distinct items in the data streams, therefore the priori knowledge about the data distribution of the streams and the number of distinct elements to be stored is not required.