Instance-Based Learning Algorithms
Machine Learning
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Efficient Privacy-Preserving k-Nearest Neighbor Search
ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
MapReduce for Data Intensive Scientific Analyses
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
A Map-Reduce System with an Alternate API for Multi-core Environments
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
MRAP: a novel MapReduce-based framework to support HPC analytics applications with access patterns
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Parallel accessing massive NetCDF data based on mapreduce
WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
Ex-MATE: Data Intensive Computing with Large Reduction Objects and Its Application to Graph Mining
CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
SciHadoop: array-based query processing in Hadoop
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
On the duality of data-intensive file system design: reconciling HDFS and PVFS
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
FastQuery: A Parallel Indexing System for Scientific Data
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Data-Intensive Science in the US DOE: Case Studies and Future Challenges
Computing in Science and Engineering
Future Generation Computer Systems
SIDR: structure-aware intelligent data routing in Hadoop
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Despite the popularity of MapReduce, there are several obstacles to applying it for developing scientific data analysis applications. Current MapReduce implementations require that data be loaded into specialized file systems, like the Hadoop Distributed File System (HDFS), whereas with rapidly growing size of scientific datasets, reloading data in another file system or format is not feasible. We present a framework that allows scientific data in different formats to be processed with a MapReduce like API. Our system is referred to as SciMATE, and is based on the MATE system developed at Ohio State. SciMATE is developed as a customizable system, which can be adapted to support processing on any of the scientific data formats. We have demonstrated the functionality of our system by creating instances that can be processing NetCDF and HDF5 formats as well as flat-files. We have also implemented three popular data mining applications and have evaluated their execution with each of the three instances of our system.