The Sloan Digital Sky Survey: Drinking from the Fire Hose
Computing in Science and Engineering
The Sloan Digital Sky Survey Data Archive Server
Computing in Science and Engineering
The Catalog Archive Server Database Management System
Computing in Science and Engineering
The sqlLoader Data-Loading Pipeline
Computing in Science and Engineering
Building environmentally sustainable information services: A green is research agenda
Journal of the American Society for Information Science and Technology
ECC-based anti-phishing protocol for cloud computing services
International Journal of Security and Networks
Hi-index | 0.00 |
We report on attempts to put an existing scientific (astronomical) database -- the Sloan Digital Sky Survey (SDSS) science archive [1] - in the cloud. Based on our experience, it is either very frustrating or impossible at this time to migrate an existing, complex SQL Server database into current cloud service offerings such as Amazon (EC2) and Microsoft (SQL Azure). Certainly it is impossible to migrate a large database in excess of a TB, but even with (much) smaller databases, the limitations of cloud services make it very difficult to migrate the data to the cloud without making changes to the schema and settings (for example, inability to migrate a spatial indexing library, and several other user-defined functions and stored procedures) that would invalidate performance comparisons between cloud and on-premise versions. So it is not surprising that our preliminary performance comparisons show a very large (an order of magnitude) performance discrepancy with the Amazon cloud version of the SDSS database. We have also not yet investigated the performance tweaks that could be possible within the cloud. Although we managed to successfully migrate (a subset of) the SDSS catalog database to Amazon EC2, we were not able to access the database in a meaningful way from the outside world. Even though this was advertised as a public dataset on the AWS blog, it was not clear how other users or the public would be able to access this data in a meaningful way, if at all. These difficulties suggest that much work and coordination needs to occur between cloud service providers and their potential database clients before science databases can successfully and effectively be deployed in the cloud. This is true not just for large scientific databases but all databases that make extensive use of advanced database management system (DBMS) features for performance and user convenience.