Nephele: efficient parallel data processing in the cloud
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Hadoop high availability through metadata replication
Proceedings of the first international workshop on Cloud data management
Toward visual analysis of ensemble data sets
Proceedings of the 2009 Workshop on Ultrascale Visualization
Concordia: a Google for malware
Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research
An experience report on scaling tools for mining software repositories using MapReduce
Proceedings of the IEEE/ACM international conference on Automated software engineering
Spatial scene similarity assessment on Hadoop
Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems
Processing web-scale multimedia data
Proceedings of the international conference on Multimedia
A study of transcoding on cloud environments for video content delivery
Proceedings of the 2010 ACM multimedia workshop on Mobile cloud media computing
MapReduce for information retrieval evaluation: "let's quickly test this on 12 TB of data"
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers
Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
PH2: an hadoop-based framework for mining structural properties from the PDB database
SAICSIT '10 Proceedings of the 2010 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists
GreenHDFS: towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster
HotPower'10 Proceedings of the 2010 international conference on Power aware computing and systems
Chukwa: a system for reliable large-scale log collection
LISA'10 Proceedings of the 24th international conference on Large installation system administration
Dynamic proportional share scheduling in Hadoop
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Scalable knowledge harvesting with high precision and high recall
Proceedings of the fourth ACM international conference on Web search and data mining
Counting triangles and the curse of the last reducer
Proceedings of the 20th international conference on World wide web
CIEL: a universal execution engine for distributed data-flow computing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
A cloud-enabled regional climate model evaluation system
Proceedings of the 2nd International Workshop on Software Engineering for Cloud Computing
A platform for scalable one-pass analytics using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
RAFT at work: speeding-up mapreduce applications under task and node failures
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Filtering: a method for solving graph problems in MapReduce
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store
Proceedings of the fourth international workshop on Data-intensive distributed computing
High-throughput virtual molecular docking: Hadoop implementation of AutoDock4 on a private cloud
Proceedings of the second international workshop on Emerging computational methods for the life sciences
Adapting skyline computation to the MapReduce framework: algorithms and experiments
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Fast clustering using MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
New ideas track: testing mapreduce-style programs
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Building a front end for a sensor data cloud
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
CloudVista: visual cluster exploration for extreme scale data in the cloud
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
An innovative approach to the development of E-government search services
EGOVIS'11 Proceedings of the Second international conference on Electronic government and the information systems perspective
IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Comparing high level mapreduce query languages
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
A distributed processing method for design patent retrieval system
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part I
PREFAIL: a programmable tool for multiple-failure injection
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Traffic events modeling for structural health monitoring
IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Using Coq in specification and program extraction of hadoop mapreduce applications
SEFM'11 Proceedings of the 9th international conference on Software engineering and formal methods
An enhanced ACO algorithm to select features for text categorization and its parallelization
Expert Systems with Applications: An International Journal
A fully-protected large-scale email system built on map-reduce framework
GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
Event retrieval in video archives using rough set theory and partially supervised learning
Multimedia Tools and Applications
Giving users an edge: A flexible Cloud model and its application for multimedia
Future Generation Computer Systems
PerfXplain: debugging MapReduce job performance
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
A parallel method for computing rough set approximations
Information Sciences: an International Journal
Distributed parallel architecture for storing and processing large datasets
SEPADS'12/EDUCATION'12 Proceedings of the 11th WSEAS international conference on Software Engineering, Parallel and Distributed Systems, and proceedings of the 9th WSEAS international conference on Engineering Education
RDFPath: path query processing on large RDF graphs with mapreduce
ESWC'11 Proceedings of the 8th international conference on The Semantic Web
Inner architecture of a social networking system
SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science
The HaLoop approach to large-scale iterative data analysis
The VLDB Journal — The International Journal on Very Large Data Bases
Digital Preservation in Grids and Clouds: A Middleware Approach
Journal of Grid Computing
Towards cross-platform cloud computing
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
The efficiency of mapreduce in parallel external memory
LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
An optimization framework for map-reduce queries
Proceedings of the 15th International Conference on Extending Database Technology
Scalable complex event processing on top of mapreduce
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
Evaluating spatial keyword queries under the mapreduce framework
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
Understanding the effects and implications of compute node related failures in hadoop
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Adaptive heterogeneous language support within a cloud runtime
Future Generation Computer Systems
MapReduce for parallel reinforcement learning
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
MapReduce approach to collective classification for networks
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I
MalPEFinder: fast and retrospective assessment of data breaches in malware attacks
Security and Communication Networks
REX: recursive, delta-based data-centric computation
Proceedings of the VLDB Endowment
Software execution protection in the cloud
Proceedings of the 1st European Workshop on Dependable Cloud Computing
AIMS'12 Proceedings of the 6th IFIP WG 6.6 international autonomous infrastructure, management, and security conference on Dependable Networks and Services
M3R: increased performance for in-memory Hadoop jobs
Proceedings of the VLDB Endowment
Building user-defined runtime adaptation routines for stream processing applications
Proceedings of the VLDB Endowment
Avatara: OLAP for web-scale analytics products
Proceedings of the VLDB Endowment
Automatic task slots assignment in Hadoop MapReduce
Proceedings of the 1st Workshop on Architectures and Systems for Big Data
HadoopPerceptron: a toolkit for distributed perceptron training and prediction with MapReduce
EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce
ACM Transactions on Database Systems (TODS)
Bridging the tenant-provider gap in cloud services
Proceedings of the Third ACM Symposium on Cloud Computing
Balancing reducer skew in MapReduce workloads using progressive sampling
Proceedings of the Third ACM Symposium on Cloud Computing
Communications of the ACM
On-the-fly task execution for speeding up pipelined mapreduce
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Cloud MapReduce for Monte Carlo bootstrap applied to Metabolic Flux Analysis
Future Generation Computer Systems
Metadata-Aware small files storage architecture on hadoop
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Data-Intensive Workload Consolidation for the Hadoop Distributed File System
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Designing smart cities: security issues
CISIM'12 Proceedings of the 11th IFIP TC 8 international conference on Computer Information Systems and Industrial Management
A virtual machine consolidation framework for MapReduce enabled computing clouds
Proceedings of the 24th International Teletraffic Congress
On the Performance of Virtualized Infrastructures for Processing Realtime Streaming Data
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process
International Journal of Intelligent Systems
ER'12 Proceedings of the 2012 international conference on Advances in Conceptual Modeling
Breaking the MapReduce stage barrier
Cluster Computing
Computing n-gram statistics in MapReduce
Proceedings of the 16th International Conference on Extending Database Technology
Efficient analytics on ordered datasets using MapReduce
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
High performance risk aggregation: addressing the data processing challenge the hadoop mapreduce way
Proceedings of the 4th ACM workshop on Scientific cloud computing
Input data organization for batch processing in time window based computations
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Enabling smarter compliance architecture using social networks and cognitive agents
IBM Journal of Research and Development
Assisting developers of big data analytics applications when deploying on hadoop clouds
Proceedings of the 2013 International Conference on Software Engineering
Reference representation techniques for large models
Proceedings of the Workshop on Scalability in Model Driven Engineering
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Algorithms for high-throughput disk-to-disk sorting
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Simplifying MapReduce data processing
International Journal of Computational Science and Engineering
Scalable multimedia content analysis on parallel platforms using python
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
LibRe: a consistency protocol for modern storage systems
Proceedings of the 6th ACM India Computing Convention
Sparrow: distributed, low latency scheduling
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
XDB: a novel database architecture for data analytics as a service
Proceedings of the 4th annual Symposium on Cloud Computing
Taking a walk on the wild side: teaching cloud computing on distributed research testbeds
Proceedings of the 45th ACM technical symposium on Computer science education
Resilient X10: efficient failure-aware programming
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
MapReduce "garbage" collection
CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research
Analytics-as-a-service: confluence of big data, cloud computing and software-as-a-service
CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research
Scalable progressive analytics on big data in the cloud
Proceedings of the VLDB Endowment
A Large-scale Images Processing Model Based on Hadoop Platform
Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data
Frontiers of Computer Science: Selected Publications from Chinese Universities
A Scalable Distributed Framework for Efficient Analytics on Ordered Datasets
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Energy and locality aware load balancing in cloud computing
Integrated Computer-Aided Engineering
Future trends in business analytics and optimization
Intelligent Data Analysis
Journal of High Speed Networks
Distributed media indexing based on MPI and MapReduce
Multimedia Tools and Applications
Scalable community detection in massive social networks using MapReduce
IBM Journal of Research and Development
Hi-index | 0.04 |
Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you: Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!