Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient Bulk-Loading of Gridfiles
IEEE Transactions on Knowledge and Data Engineering
An Evaluation of Generic Bulk Loading Techniques
Proceedings of the 27th International Conference on Very Large Data Bases
Efficient Bulk Loading of Large High-Dimensional Indexes
DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
A declarative approach to optimize bulk loading into databases
ACM Transactions on Database Systems (TODS)
Optimized Data Loading for a Multi-Terabyte Sky Survey Repository
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Hi-index | 0.00 |
With rapid development of the Internet and communication technology, massive data has been accumulated in many web-based applications such as deep web applications and web search engines. Increasing data volumes pose enormous challenges to data-loading techniques. This paper presents a data loading system in real time, the IMIL (Internet Monitoring Information Loader) that is used in RT-IMIS (Real-time Internet Monitoring Information System), which monitors real-time internet flux, manages network security, and collects a mass of Internet real-time information. IMIL consists of an extensible fault-tolerant hardware architecture, an efficient algorithm for bulk data loading using SQL*Loader and exchange partition mechanism, optimized parallelism, and guidelines for system tuning. Performance studies show the positive effects of these techniques with loading speed of every Cluster, increasing from 220 million records per day to 1.2 billion per day, and achieving the top loading speed of 6TB data when 10 Clusters are in parallel. This framework offers a promising approach for loading other large and complex databases.