Database system concepts
Daytona and the fourth-generation language Cymbal
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A String Matching Algorithm Fast on the Average
Proceedings of the 6th Colloquium, on Automata, Languages and Programming
On the duality of operating system structures
ACM SIGOPS Operating Systems Review
Handbook of massive data sets
On finding common neighborhoods in massive graphs
Theoretical Computer Science
A New Architecture for Managing Enterprise Log Data
LISA '02 Proceedings of the 16th USENIX conference on System administration
New results for finding common neighborhoods in massive graphs in the data stream model
Theoretical Computer Science
Hi-index | 0.00 |
There is a growing need for very large databases which are not practical to implement with conventional relational database technology. These databases are characterised by huge size and frequent large updates; they do not require traditional database transactions, instead the atomicity of bulk updates can be guaranteed outside of the database. Given the I/O and CPU resources available on modern computer systems, it is possible to build these huge databases using simple flat files and simply scanning all the data when doing queries. This paper describes Gecko, a system for tracking the state of every call in a very large billing system, which uses sorted flat files to implement a database of about 60G records occupying 2.6TB. This paper describes Gecko's architecture, both data and process, and how we handle interfacing with the existing legacy MVS systems. We focus on the performance issues, particularly with regard to job management, I/O management and data distribution, and on the tools we built. We finish with the important lessons we learned along the way, some tools we developed that would be useful in dealing with legacy systems, a benchmark comparing some alternative system architectures, and an assessment of the scalability of the system.