Advanced SQL 1999: Understanding Object-Relational, and Other Advanced Features
Advanced SQL 1999: Understanding Object-Relational, and Other Advanced Features
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Assigning tasks for efficiency in Hadoop: extended abstract
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Hi-index | 0.00 |
Big data is the tar sands of the data world: vast reserves of raw gritty data whose valuable information content can only be extracted at great cost. MapReduce is a popular parallel programming paradigm well suited to the programmatic extraction and analysis of information from these unstructured Big Data reserves. The Apache Hadoop implementation of MapReduce has become an important player in this market due to its ability to exploit large networks of inexpensive servers. The increasing importance of unstructured data has led to the interest in MapReduce and its Apache Hadoop implementation, which has led to the interest of data processing vendors in supporting this programming style. Oracle RDBMS has had support for the MapReduce paradigm for many years through the mechanism of user defined pipelined table functions and aggregation objects. However, such support has not been Hadoop source compatible. Native Hadoop programs needed to be rewritten before becoming usable in this framework. The ability to run Hadoop programs inside the Oracle database provides a versatile solution to database users, allowing them use programming skills they may already possess and to exploit the growing Hadoop eco-system. In this paper, we describe a prototype of Oracle In-Database Hadoop that supports the running of native Hadoop applications written in Java. This implementation executes Hadoop applications using the efficient parallel capabilities of the Oracle database and a subset of the Apache Hadoop infrastructure. This system's target audience includes both SQL and Hadoop users. We discuss the architecture and design, and in particular, demonstrate how MapReduce functionalities are seamlessly integrated within SQL queries. We also share our experience in building such a system within Oracle database and follow-on topics that we think are promising areas for exploration.