Estimation of execution time of data-intensive out-of-core processes

Authors:
Tamás Schrádi;Ákos Dudás;Sándor Juhász
Affiliations:
Budapest University of Technology and Economics, Department of Automation and Applied Informatics, Budapest, Hungary;Budapest University of Technology and Economics, Department of Automation and Applied Informatics, Budapest, Hungary;Budapest University of Technology and Economics, Department of Automation and Applied Informatics, Budapest, Hungary
Venue:
ACACOS'12 Proceedings of the 11th WSEAS international conference on Applied Computer and Applied Computational Science
Year:
2012

Citing 11
Cited 0

Automatic personalization based on Web usage mining

Communications of the ACM
Mining Association Rules: Anti-Skew Algorithms

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Asynchronous parallel disk sorting

Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
A Data Mining Algorithm for Generalized Web Prefetching

IEEE Transactions on Knowledge and Data Engineering
Information Theory, Inference & Learning Algorithms

Information Theory, Inference & Learning Algorithms
Mining Frequent Itemsets from Secondary Memory

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A further study in the data partitioning approach for frequent itemsets mining

ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
Domain and data partitioning for parallel mining of frequent closed itemsets

Proceedings of the 43rd annual Southeast regional conference - Volume 1
Improvements in the data partitioning approach for frequent itemsets mining

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
A latent usage approach for clustering web transaction and building user profile

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we investigate various out-of-core methods aiming to process large datasets efficiently using state-of-the-art, personal computers. A dataset is considered large in case it does not fit into the main memory. First an eager method will be shown to demonstrate the incapability and inefficiency of direct in-memory processing of huge amounts of data. Afterward two out-of-core extensions will be introduced that use the secondary storage to overcome the difficulties caused by the limited memory. The Periodic Partial Result Merging algorithm operates with smaller chunks, which fit in the main memory and continuously propagates the results on the secondary storage. The K-way Merge technique follows a similar principle, but it separates the processing and the merging phases. The two proposed methods proved to be suitable to process large datasets efficiently in a fault tolerant way. A comparative evaluation of the out-of-core algorithms and a novel model for estimation of their execution time will also be given. The goodness of the model will be validated by comparing its estimation to the results of practical measurements.