Distributed k-core decomposition
Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Databases and Social Networks
Efficient duplicate detection on cloud using a new signature scheme
WAIM'11 Proceedings of the 12th international conference on Web-age information management
No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics
Proceedings of the 2nd ACM Symposium on Cloud Computing
Evaluating the suitability of mapreduce for surface temperature analysis codes
Proceedings of the second international workshop on Data intensive computing in the clouds
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
Proceedings of the Seventh Annual Workshop on Cyber Security and Information Intelligence Research
Foundations and Trends® in Machine Learning
Performance engineering for cloud computing
EPEW'11 Proceedings of the 8th European conference on Computer Performance Engineering
Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce
Proceedings of the 21st international conference on World Wide Web
Flexible and efficient distributed resolution of large entities
FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
A service-oriented taxonomical spectrum, cloudy challenges and opportunities of cloud computing
International Journal of Communication Systems
Scalable subspace logistic regression models for high dimensional data
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Distributed simulated annealing with mapreduce
EvoApplications'12 Proceedings of the 2012t European conference on Applications of Evolutionary Computation
Improving the diagnosis of mild hypertrophic cardiomyopathy with MapReduce
Proceedings of third international workshop on MapReduce and its Applications Date
Finding and exploring memes in social media
Proceedings of the 23rd ACM conference on Hypertext and social media
Designing good MapReduce algorithms
XRDS: Crossroads, The ACM Magazine for Students - Big Data
Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Scalable random forests for massive data
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Predicting execution bottlenecks in map-reduce clusters
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
MapReduce-based similarity join for metric spaces
Proceedings of the 1st International Workshop on Cloud Intelligence
Only aggressive elephants are fast elephants
Proceedings of the VLDB Endowment
Remote sensing image data storage and search method based on pyramid model in cloud
RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
An optimized approach for storing and accessing small files on cloud storage
Journal of Network and Computer Applications
PATTY: a taxonomy of relational patterns with semantic types
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Distributed adaptive routing for big-data applications running on data center networks
Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
An open-source toolkit for mining Wikipedia
Artificial Intelligence
Inexact subgraph isomorphism in MapReduce
Journal of Parallel and Distributed Computing
Future Generation Computer Systems
UCAmI'12 Proceedings of the 6th international conference on Ubiquitous Computing and Ambient Intelligence
Estimating Beijing's travel delays at intersections with floating car data
Proceedings of the 5th ACM SIGSPATIAL International Workshop on Computational Transportation Science
The big data ecosystem at LinkedIn
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A throughput optimal algorithm for map task scheduling in mapreduce with data locality
ACM SIGMETRICS Performance Evaluation Review
A survey of web archive search architectures
Proceedings of the 22nd international conference on World Wide Web companion
Upper and lower bounds on the cost of a map-reduce computation
Proceedings of the VLDB Endowment
Job scheduling for optimizing data locality in Hadoop clusters
Proceedings of the 20th European MPI Users' Group Meeting
Mammoth: autonomic data processing framework for scientific state-transition applications
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Gunther: search-based auto-tuning of mapreduce
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Optimization strategies for A/B testing on HADOOP
Proceedings of the VLDB Endowment
Rapid processing of remote sensing images based on cloud computing
Future Generation Computer Systems
Challenges to error diagnosis in hadoop ecosystems
LISA'13 Proceedings of the 27th international conference on Large Installation System Administration
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Parallel processing of large graphs
Future Generation Computer Systems
A Measurement Study of Data-Intensive Network Traffic Patterns in a Private Cloud
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
International Journal of Approximate Reasoning
Hi-index | 0.00 |
Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework -- an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run Hadoop clusters. This revised edition covers recent changes to Hadoop, including new features such as Hive, Sqoop, and Avro. It also provides illuminating case studies that illustrate how Hadoop is used to solve specific problems. Looking to get the most out of your data? This is your book. Use the Hadoop Distributed File System (HDFS) for storing large datasets, then run distributed computations over those datasets with MapReduce Become familiar with Hadoops data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Analyze datasets with Hive, Hadoops data warehousing system Take advantage of HBase, Hadoops database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems "Now you have the opportunity to learn about Hadoop from a master -- not only of the technology, but also of common sense and plain talk." --Doug Cutting, Cloudera