Hadoop: The Definitive Guide

Authors:
Tom White
Affiliations:
-
Venue:
Hadoop: The Definitive Guide
Year:
2010

Citing 0
Cited 45

Distributed k-core decomposition

Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Social-data storage-systems

Databases and Social Networks
Efficient duplicate detection on cloud using a new signature scheme

WAIM'11 Proceedings of the 12th international conference on Web-age information management
No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics

Proceedings of the 2nd ACM Symposium on Cloud Computing
Evaluating the suitability of mapreduce for surface temperature analysis codes

Proceedings of the second international workshop on Data intensive computing in the clouds
Parallel data processing with MapReduce: a survey

ACM SIGMOD Record
An architecture for Concordia

Proceedings of the Seventh Annual Workshop on Cyber Security and Information Intelligence Research
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

Foundations and Trends® in Machine Learning
Performance engineering for cloud computing

EPEW'11 Proceedings of the 8th European conference on Computer Performance Engineering
Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce

Proceedings of the 21st international conference on World Wide Web
Flexible and efficient distributed resolution of large entities

FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
A service-oriented taxonomical spectrum, cloudy challenges and opportunities of cloud computing

International Journal of Communication Systems
Scalable subspace logistic regression models for high dimensional data

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Distributed simulated annealing with mapreduce

EvoApplications'12 Proceedings of the 2012t European conference on Applications of Evolutionary Computation
Improving the diagnosis of mild hypertrophic cardiomyopathy with MapReduce

Proceedings of third international workshop on MapReduce and its Applications Date
Finding and exploring memes in social media

Proceedings of the 23rd ACM conference on Hypertext and social media
Designing good MapReduce algorithms

XRDS: Crossroads, The ACM Magazine for Students - Big Data
FutureGrid education: using case studies to develop a curriculum for communicating parallel and distributed computing concepts

Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Scalable random forests for massive data

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Predicting execution bottlenecks in map-reduce clusters

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
MapReduce-based similarity join for metric spaces

Proceedings of the 1st International Workshop on Cloud Intelligence
Only aggressive elephants are fast elephants

Proceedings of the VLDB Endowment
Remote sensing image data storage and search method based on pyramid model in cloud

RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
An optimized approach for storing and accessing small files on cloud storage

Journal of Network and Computer Applications
PATTY: a taxonomy of relational patterns with semantic types

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Distributed adaptive routing for big-data applications running on data center networks

Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
An open-source toolkit for mining Wikipedia

Artificial Intelligence
Inexact subgraph isomorphism in MapReduce

Journal of Parallel and Distributed Computing
Exploiting geospatial and chronological characteristics in data streams to enable efficient storage and retrievals

Future Generation Computer Systems
Cloud integrated web platform for marine monitoring using GIS and remote sensing: application to oil spill detection through SAR images

UCAmI'12 Proceedings of the 6th international conference on Ubiquitous Computing and Ambient Intelligence
Estimating Beijing's travel delays at intersections with floating car data

Proceedings of the 5th ACM SIGSPATIAL International Workshop on Computational Transportation Science
The big data ecosystem at LinkedIn

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A throughput optimal algorithm for map task scheduling in mapreduce with data locality

ACM SIGMETRICS Performance Evaluation Review
A survey of web archive search architectures

Proceedings of the 22nd international conference on World Wide Web companion
Upper and lower bounds on the cost of a map-reduce computation

Proceedings of the VLDB Endowment
Job scheduling for optimizing data locality in Hadoop clusters

Proceedings of the 20th European MPI Users' Group Meeting
Mammoth: autonomic data processing framework for scientific state-transition applications

Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Gunther: search-based auto-tuning of mapreduce

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Optimization strategies for A/B testing on HADOOP

Proceedings of the VLDB Endowment
Rapid processing of remote sensing images based on cloud computing

Future Generation Computer Systems
Challenges to error diagnosis in hadoop ecosystems

LISA'13 Proceedings of the 27th international conference on Large Installation System Administration
Implementation of data affinity-based distributed parallel processing on a distributed key value store

Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Parallel processing of large graphs

Future Generation Computer Systems
A Measurement Study of Data-Intensive Network Traffic Patterns in a Private Cloud

UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems

International Journal of Approximate Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework -- an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run Hadoop clusters. This revised edition covers recent changes to Hadoop, including new features such as Hive, Sqoop, and Avro. It also provides illuminating case studies that illustrate how Hadoop is used to solve specific problems. Looking to get the most out of your data? This is your book. Use the Hadoop Distributed File System (HDFS) for storing large datasets, then run distributed computations over those datasets with MapReduce Become familiar with Hadoops data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Analyze datasets with Hive, Hadoops data warehousing system Take advantage of HBase, Hadoops database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems "Now you have the opportunity to learn about Hadoop from a master -- not only of the technology, but also of common sense and plain talk." --Doug Cutting, Cloudera