Estimating top-k destinations in data streams

Authors:
Nuno Homem;Joao Paulo Carvalho
Affiliations:
TULisbon, Instituto Superior Técnico, INESC-ID, Lisboa, Portugal;TULisbon, Instituto Superior Técnico, INESC-ID, Lisboa, Portugal
Venue:
IPMU'10 Proceedings of the Computational intelligence for knowledge-based systems design, and 13th international conference on Information processing and management of uncertainty
Year:
2010

Citing 10
Cited 0

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
A linear-time probabilistic counting algorithm for database applications

ACM Transactions on Database Systems (TODS)
Summary cache: a scalable wide-area web cache sharing protocol

IEEE/ACM Transactions on Networking (TON)
Maintaining Stream Statistics over Sliding Windows

SIAM Journal on Computing
New directions in traffic measurement and accounting

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
What's hot and what's not: tracking most frequent items dynamically

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice

ACM Transactions on Computer Systems (TOCS)
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Probabilistic lossy counting: an efficient algorithm for finding heavy hitters

ACM SIGCOMM Computer Communication Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

One considers the problem of estimating the most frequent values in a data stream. In many cases an approximate answer may be enough. A novel algorithm is presented to approximate the most frequent values using a mixed approach between counter-based techniques and sketch-based ones. The algorithm is then used to find the most frequent destinations of calls by individual customers of telecommunications operators. The use of fast and small footprint algorithms is critical due to the huge number of customers to check and approximate answers are enough in most situations. The problem is that such detection needs to be performed for each individual customer and kept up to date at all times. This paper presents telecommunications customer's behavior to justify the use of approximate algorithms. Although used in this paper on telecommunications this algorithm may well be used in other contexts.