Gecko: tracking a very large billing system

Authors:
Andrew Hume;Scott Daniels;Angus MacLellan
Affiliations:
AT&T Labs-Research;Electronic Data Systems Corporation;AT&T Labs-Research
Venue:
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Year:
2000

Citing 4
Cited 4

Database system concepts

Database system concepts
Daytona and the fourth-generation language Cymbal

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A String Matching Algorithm Fast on the Average

Proceedings of the 6th Colloquium, on Automata, Languages and Programming
On the duality of operating system structures

ACM SIGOPS Operating Systems Review

Billing in the large

Handbook of massive data sets
On finding common neighborhoods in massive graphs

Theoretical Computer Science
A New Architecture for Managing Enterprise Log Data

LISA '02 Proceedings of the 16th USENIX conference on System administration
New results for finding common neighborhoods in massive graphs in the data stream model

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is a growing need for very large databases which are not practical to implement with conventional relational database technology. These databases are characterised by huge size and frequent large updates; they do not require traditional database transactions, instead the atomicity of bulk updates can be guaranteed outside of the database. Given the I/O and CPU resources available on modern computer systems, it is possible to build these huge databases using simple flat files and simply scanning all the data when doing queries. This paper describes Gecko, a system for tracking the state of every call in a very large billing system, which uses sorted flat files to implement a database of about 60G records occupying 2.6TB. This paper describes Gecko's architecture, both data and process, and how we handle interfacing with the existing legacy MVS systems. We focus on the performance issues, particularly with regard to job management, I/O management and data distribution, and on the tools we built. We finish with the important lessons we learned along the way, some tools we developed that would be useful in dealing with legacy systems, a benchmark comparing some alternative system architectures, and an assessment of the scalability of the system.