Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Distributed streams algorithms for sliding windows
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Data streams: algorithms and applications
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
Dynamic sample selection for approximate query processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Finding frequent items in data streams
Theoretical Computer Science - Special issue on automata, languages and programming
Synopsis diffusion for robust aggregation in sensor networks
SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Hi-index | 0.00 |
A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Consequently, the knowledge embedded in a data stream is more likely to be changed as time goes by. Data items that appear frequently in data streams are called frequent data items, which often play a more important role than others in data streams management system. So how to identifying frequent items is one of key technologies. As distributed data streams management system is concerned, there are many input data streams having different effect on result, the pure character of frequency is unfit for finding the important data. To solve this problem, effective data of distributed data streams is defined in this paper, which combines the frequency of items and the weight of streams. Based on an optimization technique that is devised to minimize main memory usage, a robust mining approach is proposed. According to this algorithm, the effective data can be output with limited space cost. At the same time, the sensitivity of algorithm is analyzed which shows the output result is within the error given by the user. Finally a series of experiments show the efficiency of the mining algorithm.