Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
The input/output complexity of sorting and related problems
Communications of the ACM
Parallel database systems: the future of high performance database systems
Communications of the ACM
Eliminating receive livelock in an interrupt-driven kernel
ACM Transactions on Computer Systems (TOCS)
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Advances in the dataflow computational model
Parallel Computing - Special Anniversary issue
Parallel sorting on a shared-nothing architecture using probabilistic splitting
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
A large-scale study of failures in high-performance computing systems
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Microreboot — A technique for cheap recovery
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Subtleties in tolerating correlated failures in wide-area storage systems
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Failure trends in a large disk drive population
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you?
ACM Transactions on Storage (TOS)
Practical System Reliability
Bioinformatics
ACM SIGOPS Operating Systems Review
Stateful bulk processing for incremental analytics
Proceedings of the 1st ACM symposium on Cloud computing
Skew-resistant parallel processing of feature-extracting scientific user-defined functions
Proceedings of the 1st ACM symposium on Cloud computing
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
Availability in globally distributed storage systems
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Large-scale incremental processing using distributed transactions and notifications
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
TritonSort: a balanced large-scale sorting system
Proceedings of the 8th USENIX conference on Networked systems design and implementation
SkewTune: mitigating skew in mapreduce applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
QuickSAN: a storage area network for fast, distributed, solid state disks
Proceedings of the 40th Annual International Symposium on Computer Architecture
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
CooMR: cross-task coordination for efficient data management in MapReduce programs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
"Big Data" computing increasingly utilizes the MapReduce programming model for scalable processing of large data collections. Many MapReduce jobs are I/O-bound, and so minimizing the number of I/O operations is critical to improving their performance. In this work, we present Themis, a MapReduce implementation that reads and writes data records to disk exactly twice, which is the minimum amount possible for data sets that cannot fit in memory. In order to minimize I/O, Themis makes fundamentally different design decisions from previous MapReduce implementations. Themis performs a wide variety of MapReduce jobs -- including click log analysis, DNA read sequence alignment, and PageRank -- at nearly the speed of TritonSort's record-setting sort performance [29].