Automatic personalization based on Web usage mining
Communications of the ACM
Mining Association Rules: Anti-Skew Algorithms
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
An Efficient Algorithm for Mining Association Rules in Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Asynchronous parallel disk sorting
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
A Data Mining Algorithm for Generalized Web Prefetching
IEEE Transactions on Knowledge and Data Engineering
Information Theory, Inference & Learning Algorithms
Information Theory, Inference & Learning Algorithms
Mining Frequent Itemsets from Secondary Memory
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A further study in the data partitioning approach for frequent itemsets mining
ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
Domain and data partitioning for parallel mining of frequent closed itemsets
Proceedings of the 43rd annual Southeast regional conference - Volume 1
Improvements in the data partitioning approach for frequent itemsets mining
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
A latent usage approach for clustering web transaction and building user profile
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Hi-index | 0.00 |
In this paper we investigate various out-of-core methods aiming to process large datasets efficiently using state-of-the-art, personal computers. A dataset is considered large in case it does not fit into the main memory. First an eager method will be shown to demonstrate the incapability and inefficiency of direct in-memory processing of huge amounts of data. Afterward two out-of-core extensions will be introduced that use the secondary storage to overcome the difficulties caused by the limited memory. The Periodic Partial Result Merging algorithm operates with smaller chunks, which fit in the main memory and continuously propagates the results on the secondary storage. The K-way Merge technique follows a similar principle, but it separates the processing and the merging phases. The two proposed methods proved to be suitable to process large datasets efficiently in a fault tolerant way. A comparative evaluation of the out-of-core algorithms and a novel model for estimation of their execution time will also be given. The goodness of the model will be validated by comparing its estimation to the results of practical measurements.