SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats

Authors:
Yi Wang;Wei Jiang;Gagan Agrawal
Affiliations:
-;-;-
Venue:
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Year:
2012

Citing 14
Cited 2

Instance-Based Learning Algorithms

Machine Learning
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Efficient Privacy-Preserving k-Nearest Neighbor Search

ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
MapReduce for Data Intensive Scientific Analyses

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems

Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
A Map-Reduce System with an Alternate API for Multi-core Environments

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
MRAP: a novel MapReduce-based framework to support HPC analytics applications with access patterns

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Parallel accessing massive NetCDF data based on mapreduce

WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
Ex-MATE: Data Intensive Computing with Large Reduction Objects and Its Application to Graph Mining

CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
SciHadoop: array-based query processing in Hadoop

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
On the duality of data-intensive file system design: reconciling HDFS and PVFS

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
FastQuery: A Parallel Indexing System for Scientific Data

CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Data-Intensive Science in the US DOE: Case Studies and Future Challenges

Computing in Science and Engineering

Exploiting geospatial and chronological characteristics in data streams to enable efficient storage and retrievals

Future Generation Computer Systems
SIDR: structure-aware intelligent data routing in Hadoop

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite the popularity of MapReduce, there are several obstacles to applying it for developing scientific data analysis applications. Current MapReduce implementations require that data be loaded into specialized file systems, like the Hadoop Distributed File System (HDFS), whereas with rapidly growing size of scientific datasets, reloading data in another file system or format is not feasible. We present a framework that allows scientific data in different formats to be processed with a MapReduce like API. Our system is referred to as SciMATE, and is based on the MATE system developed at Ohio State. SciMATE is developed as a customizable system, which can be adapted to support processing on any of the scientific data formats. We have demonstrated the functionality of our system by creating instances that can be processing NetCDF and HDF5 formats as well as flat-files. We have also implemented three popular data mining applications and have evaluated their execution with each of the three instances of our system.