Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
A linear-time probabilistic counting algorithm for database applications
ACM Transactions on Database Systems (TODS)
Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
Maintaining Stream Statistics over Sliding Windows
SIAM Journal on Computing
New directions in traffic measurement and accounting
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Probabilistic lossy counting: an efficient algorithm for finding heavy hitters
ACM SIGCOMM Computer Communication Review
Hi-index | 0.00 |
One considers the problem of estimating the most frequent values in a data stream. In many cases an approximate answer may be enough. A novel algorithm is presented to approximate the most frequent values using a mixed approach between counter-based techniques and sketch-based ones. The algorithm is then used to find the most frequent destinations of calls by individual customers of telecommunications operators. The use of fast and small footprint algorithms is critical due to the huge number of customers to check and approximate answers are enough in most situations. The problem is that such detection needs to be performed for each individual customer and kept up to date at all times. This paper presents telecommunications customer's behavior to justify the use of approximate algorithms. Although used in this paper on telecommunications this algorithm may well be used in other contexts.