MySQL to NoSQL: data modeling challenges in supporting scalability

Authors:
Aaron Schram;Kenneth M. Anderson
Affiliations:
University of Colorado, Boulder, CO, USA;University of Colorado, Boulder, CO, USA
Venue:
Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity
Year:
2012

Citing 7
Cited 0

Citizen communications in crisis: anticipating a future of ICT-supported public participation

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Cassandra: a decentralized structured storage system

ACM SIGOPS Operating Systems Review
A vision for technology-mediated support for public participation & assistance in mass emergencies & disasters

Proceedings of the 2010 ACM-BCS Visions of Computer Science Conference
Design and implementation of a data analytics infrastructure in support of crisis informatics research (NIER track)

Proceedings of the 33rd International Conference on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software systems today seldom reside as isolated systems confined to generating and consuming their own data. Collecting, integrating and storing large amounts of data from disparate sources has become a need for many software engineers, as well as for scientists in research settings. This paper presents the lessons learned when transitioning a large-scale data collection infrastructure from a relational database to a hybrid persistence architecture that makes use of both relational and NoSQL technologies. Our examples are drawn from the software infrastructure we built to collect, store, and analyze vast numbers of status updates from the Twitter micro-blogging service in support of a large interdisciplinary group performing research in the area of crisis informatics. We present both the software architecture and data modeling challenges that we encountered during the transition as well as the benefits we gained having migrated to the hy-brid persistence architecture.