Using a real-time top-k algorithm to mine the most frequent items over multiple streams

Authors:
Ling Wang;Zhao Yang Qu;Tie Hua Zhou;Keun Ho Ryu
Affiliations:
Department of Computer Science and Technology, School of Information Engineering, Northeast Dianli University, Jilin, China;Department of Computer Science and Technology, School of Information Engineering, Northeast Dianli University, Jilin, China;Database/Bioinformatics Laboratory, School of Electrical & Computer Engineering, Chungbuk National University, Chungbuk, Korea;Database/Bioinformatics Laboratory, School of Electrical & Computer Engineering, Chungbuk National University, Chungbuk, Korea
Venue:
ICIC'13 Proceedings of the 9th international conference on Intelligent Computing Theories
Year:
2013

Citing 14
Cited 0

Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
STREAM: the stanford stream data manager (demonstration description)

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Identifying frequent items in sliding windows over on-line packet streams

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
On-the-fly sharing for streamed aggregation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Continuous monitoring of top-k queries over sliding windows

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Optimal workload-based weighted wavelet synopses

Theoretical Computer Science
On Construction of Holistic Synopses under the Duplicate Semantics of Streaming Queries

TIME '07 Proceedings of the 14th International Symposium on Temporal Representation and Reasoning
Out-of-order processing: a new architecture for high-performance stream systems

Proceedings of the VLDB Endowment
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A wavelet transform for efficient consolidation of sensor relations with quality guarantees

Proceedings of the VLDB Endowment
An audit environment for outsourcing of frequent itemset mining

Proceedings of the VLDB Endowment
Supporting top-k aggregate queries over unequal synopsis on internet traffic streams

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Processing top-k join queries

Proceedings of the VLDB Endowment
Maintaining frequent itemsets over high-speed data streams

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Some applications such as sensor networks, internet traffic analysis, location-based services, and health measurements are always required for considering unbounded, fast, large-volumes, continuous, even for distributed stream data. It's a better way to use synopsis as a list of partial summaries of unknown item sets in order to reduce the memory space usage, let it can afford to process so fast and huge incoming data. Normally, different quantity of item set leads to different summaries, especially for Top-k operator which as a partial preprocess over synopsis. Therefore, we proposed smooth synopsis that dynamically assigns a numeral interval to resolve the items set, in order to maintain a more accurate approximate answers' list from partial Top-k processing. In particular, we proposed an algorithm (called SFI algorithm) to mine the most frequent items by a more adaptive and fast way in specific stream resources. Finally, our experimental results demonstrate the accuracy and efficiency of our approximation techniques.