SQL TVF Controlling Forms - Express Structured Parallel Data Intensive Computing
DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
User Defined Partitioning - Group Data Based on Computation Model
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Using realistic simulation for performance analysis of mapreduce setups
Proceedings of the 1st ACM workshop on Large-Scale system and application performance
Scaling-Up and Speeding-Up Video Analytics Inside Database Engine
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Extend UDF Technology for Integrated Analytics
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Efficiently support MapReduce-like computation models inside parallel DBMS
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
FlumeJava: easy, efficient data-parallel pipelines
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Making cloud intermediate data fault-tolerant
Proceedings of the 1st ACM symposium on Cloud computing
Assigning tasks for efficiency in Hadoop: extended abstract
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
MRAP: a novel MapReduce-based framework to support HPC analytics applications with access patterns
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
On availability of intermediate data in cloud computations
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Generalized UDF for analytics inside database engine
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Scale out parallel and distributed CDR stream analytics
Globe'10 Proceedings of the Third international conference on Data management in grid and peer-to-peer systems
Scalable information extraction for web queries
International Journal of Computational Science and Engineering
Data stream analytics as cloud service for mobile applications
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems: Part II
Continuous mapreduce for In-DB stream analytics
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Experience in Continuous analytics as a Service (CaaaS)
Proceedings of the 14th International Conference on Extending Database Technology
A latency and fault-tolerance optimizer for online parallel query plans
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
HiTune: dataflow-based performance analysis for big data cloud
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Query engine grid for executing SQL streaming process
Globe'11 Proceedings of the 4th international conference on Data management in grid and peer-to-peer systems
On the duality of data-intensive file system design: reconciling HDFS and PVFS
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
SQL streaming process in query engine net
OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part I
A survey of emerging approaches to spam filtering
ACM Computing Surveys (CSUR)
HiTune: dataflow-based performance analysis for big data cloud
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Halt or continue: estimating progress of queries in the cloud
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Understanding the effects and implications of compute node related failures in hadoop
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
SymGrid: a framework for symbolic computation on the grid
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a Map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a Reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines.The MapReduce run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required intermachine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: thousands of MapReduce programs have been implemented and several thousand thousand MapReduce jobs are executed on Google's clusters every day.In this talk I'll describe the basic programming model, discuss our experience using it in a variety of domains, and talk about the implications of programming models like MapReduce as one paradigm to simplify development of parallel software for multi-core microprocessors.