SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
MapReducing a genomic sequencing workflow
Proceedings of the second international workshop on MapReduce and its applications
SNP genotype calling with MapReduce
Proceedings of third international workshop on MapReduce and its Applications Date
Hi-index | 0.00 |
MapReduce has become increasingly popular as a simple and efficient paradigm for large-scale data processing. One of the main reasons for its popularity is the availability of a production-level open source implementation, Hadoop, written in Java. There is considerable interest, however, in tools that enable Python programmers to access the framework, due to the language's high popularity. Here we present a Python package that provides an API for both the MapReduce and the distributed file system sections of Hadoop, and show its advantages with respect to the other available solutions for Hadoop Python programming, Jython and Hadoop Streaming.