ICDT '97 Proceedings of the 6th International Conference on Database Theory
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
PNUTS: Yahoo!'s hosted data serving platform
Proceedings of the VLDB Endowment
Cassandra: structured storage system on a P2P network
Proceedings of the 28th ACM symposium on Principles of distributed computing
Benchmarking cloud serving systems with YCSB
Proceedings of the 1st ACM symposium on Cloud computing
The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing
The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Reduce, You Say: What NoSQL Can Do for Data Aggregation and BI in Large Repositories
DEXA '11 Proceedings of the 2011 22nd International Workshop on Database and Expert Systems Applications
NoSQL databases: a step to database scalability in web environment
Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Can the elephants handle the NoSQL onslaught?
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Scientific facilities such as the Advanced Light Source (ALS) and Joint Genome Institute and projects such as the Materials Project have an increasing need to capture, store, and analyze dynamic semi-structured data and metadata. A similar growth of semi-structured data within large Internet service providers has led to the creation of NoSQL data stores for scalable indexing and MapReduce for scalable parallel analysis. MapReduce and NoSQL stores have been applied to scientific data. Hadoop, the most popular open source implementation of MapReduce, has been evaluated, utilized and modified for addressing the needs of different scientific analysis problems. ALS and the Materials Project are using MongoDB, a document oriented NoSQL store. However, there is a limited understanding of the performance trade-offs of using these two technologies together.In this paper we evaluate the performance, scalability and fault-tolerance of using MongoDB with Hadoop, towards the goal of identifying the right software environment for scientific data analysis.