A case for redundant arrays of inexpensive disks (RAID)
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Argon: performance insulation for shared storage servers
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Write off-loading: practical power management for enterprise storage
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Measurement and analysis of large-scale network file system workloads
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Tashi: location-aware cluster management
ACDC '09 Proceedings of the 1st workshop on Automated control for datacenters and clouds
DiskReduce: RAID for data-intensive scalable computing
Proceedings of the 4th Annual Workshop on Petascale Data Storage
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Proceedings of the 5th European conference on Computer systems
Cloud analytics: do we really need to reinvent the storage stack?
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
Disk-locality in datacenter computing considered irrelevant
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Design implications for enterprise storage systems via multi-dimensional trace analysis
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
The Case for Evaluating MapReduce Performance Using Workload Suites
MASCOTS '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
On the duality of data-intensive file system design: reconciling HDFS and PVFS
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
PACMan: coordinated memory caching for parallel jobs
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Distributed file systems built for data analytics and enterprise storage systems have very different functionality requirements. For this reason, enabling analytics on enterprise data commonly introduces a separate analytics storage silo. This generates additional costs, and inefficiencies in data management, e.g., whenever data needs to be archived, copied, or migrated across silos. MixApart uses an integrated data caching and scheduling solution to allow MapReduce computations to analyze data stored on enterprise storage systems. The front-end caching layer enables the local storage performance required by data analytics. The shared storage back-end simplifies data management. We evaluate MixApart using a 100-core Amazon EC2 cluster with micro-benchmarks and production workload traces. Our evaluation shows that MixApart provides (i) up to 28% faster performance than the traditional ingest-then-compute workflows used in enterprise IT analytics, and (ii) comparable performance to an ideal Hadoop setup without data ingest, at similar cluster sizes.