BlobSeer: Next-generation data management for large scale infrastructures
Journal of Parallel and Distributed Computing
Scalable knowledge harvesting with high precision and high recall
Proceedings of the fourth ACM international conference on Web search and data mining
Blink: managing server clusters on intermittent power
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Scale and concurrency of GIGA+: file system directories with millions of files
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
FATE and DESTINI: a framework for cloud recovery testing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Cumulus: an open source storage cloud for science
Proceedings of the 2nd international workshop on Scientific cloud computing
Adapting MapReduce for HPC environments
Proceedings of the 20th international symposium on High performance distributed computing
Towards continuous policy-driven demand response in data centers
Proceedings of the 2nd ACM SIGCOMM workshop on Green networking
CassMail: a scalable, highly-available, and rapidly-prototyped e-mail service
Proceedings of the 11th IFIP WG 6.1 international conference on Distributed applications and interoperable systems
Fast crash recovery in RAMCloud
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
PREFAIL: a programmable tool for multiple-failure injection
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Concurrent non-deferred reference counting on the Microgrid: first experiences
IFL'10 Proceedings of the 22nd international conference on Implementation and application of functional languages
Qserv: a distributed shared-nothing database for the LSST catalog
State of the Practice Reports
Hadoop acceleration through network levitated merge
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
On the duality of data-intensive file system design: reconciling HDFS and PVFS
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
MARIANE: MApReduce Implementation Adapted for HPC Environments
GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Proceedings of the first international workshop on High performance computing, networking and analytics for the power grid
Experimenting lucene index on HBase in an HPC environment
Proceedings of the first annual workshop on High performance computing meets databases
Riding the elephant: managing ensembles with hadoop
Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
GHOST: GPGPU-offloaded high performance storage I/O deduplication for primary storage system
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Horus: fine-grained encryption-based security for high performance petascale storage
Proceedings of the sixth workshop on Parallel Data Storage
Apriori-based frequent itemset mining algorithms on MapReduce
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Serving large-scale batch computed data with project Voldemort
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Walnut: a unified cloud object store
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Don't lose sleep over availability: the GreenUp decentralized wakeup service
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
An intelligent cloud system adopting file pre-fetching
ADCONS'11 Proceedings of the 2011 international conference on Advanced Computing, Networking and Security
Improving the diagnosis of mild hypertrophic cardiomyopathy with MapReduce
Proceedings of third international workshop on MapReduce and its Applications Date
CEFLS: A Cost-Effective File Lookup Service in a Distributed Metadata File System
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
MARLA: MapReduce for Heterogeneous Clusters
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Hierarchical MapReduce Programming Model and Scheduling Algorithms
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
A Workflow-Aware Storage System: An Opportunity Study
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Investigation of Data Locality in MapReduce
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
MapReduce Workload Modeling with Statistical Approach
Journal of Grid Computing
Integrated in-system storage architecture for high performance computing
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Schönhage-Strassen algorithm with MapReduce for multiplying terabit integers
Proceedings of the 2011 International Workshop on Symbolic-Numeric Computation
The seven deadly sins of cloud computing research
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Gnothi: separating data and metadata for efficient and available storage replication
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Scalability of replicated metadata services in distributed file systems
DAIS'12 Proceedings of the 12th IFIP WG 6.1 international conference on Distributed Applications and Interoperable Systems
HFAA: a generic socket API for Hadoop file systems
Proceedings of the 2nd Workshop on Architectures and Systems for Big Data
An optimized approach for storing and accessing small files on cloud storage
Journal of Network and Computer Applications
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
CloST: a hadoop-based storage system for big spatio-temporal data analytics
Proceedings of the 21st ACM international conference on Information and knowledge management
Scalable Reed-Solomon-based reliable local storage for HPC applications on iaas clouds
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Data-Intensive Workload Consolidation for the Hadoop Distributed File System
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Expressive Query Support for Multidimensional Data in Distributed Hash Tables
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Towards big linked data: a large-scale, distributed semantic data storage
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
A RAMCloud Storage System based on HDFS: Architecture, implementation and evaluation
Journal of Systems and Software
ACM Transactions on Storage (TOS)
Future Generation Computer Systems
X10-FT: transparent fault tolerance for APGAS language and runtime
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Elastic and effective spatio-temporal query processing scheme on Hadoop
Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
Indexing and searching 100M images with map-reduce
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
IBIS: interposed big-data I/O scheduler
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
SCDA: SLA-aware cloud datacenter architecture for efficient content storage and retrieval
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
The big data ecosystem at LinkedIn
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
High performance risk aggregation: addressing the data processing challenge the hadoop mapreduce way
Proceedings of the 4th ACM workshop on Scientific cloud computing
A throughput optimal algorithm for map task scheduling in mapreduce with data locality
ACM SIGMETRICS Performance Evaluation Review
A classification of file placement and replication methods on grids
Future Generation Computer Systems
Input data organization for batch processing in time window based computations
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Power-reduction techniques for data-center storage systems
ACM Computing Surveys (CSUR)
Robustness in the Salus scalable block store
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
QuickSAN: a storage area network for fast, distributed, solid state disks
Proceedings of the 40th Annual International Symposium on Computer Architecture
Supporting robust system analysis with the test matrix tool framework
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
Obtaining ground-truth software architectures
Proceedings of the 2013 International Conference on Software Engineering
Toward common patterns for distributed, concurrent, fault-tolerant code
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
FSaaS: Configuring Policies for Managing Shared Files Among Cooperating, Distributed Applications
International Journal of Web Portals
ACIC: automatic cloud I/O configurator for HPC applications
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
CooMR: cross-task coordination for efficient data management in MapReduce programs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Prolog programming with a map-reduce parallel construct
Proceedings of the 15th Symposium on Principles and Practice of Declarative Programming
Boosting energy efficiency with mirrored data block replication policy and energy scheduler
ACM SIGOPS Operating Systems Review
PredictionIO: a distributed machine learning server for practical software development
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Simplifying MapReduce data processing
International Journal of Computational Science and Engineering
Leveraging sharding in the design of scalable replication protocols
Proceedings of the 4th annual Symposium on Cloud Computing
Apache Hadoop YARN: yet another resource negotiator
Proceedings of the 4th annual Symposium on Cloud Computing
USTO.RE: a private cloud storage software system
ICWE'13 Proceedings of the 13th international conference on Web Engineering
A protocol for simultaneous use of confidentiality and integrity in large-scale storage systems
Proceedings of the 6th International Conference on Security of Information and Networks
PonIC: using stratosphere to speed up pig analytics
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Generating request streams on Big Data using clustered renewal processes
Performance Evaluation
CRUCIBLE: towards unified secure on- and off-line analytics at scale
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Copysets: reducing the frequency of data loss in cloud storage
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Securing data services: a security architecture design for private storage cloud based on HDFS
International Journal of Grid and Utility Computing
Optimization strategies for A/B testing on HADOOP
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Structuring PLFS for extensibility
PDSW '13 Proceedings of the 8th Parallel Data Storage Workshop
MapReduce "garbage" collection
CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research
DIMO: distributed index for matching multimedia objects using MapReduce
Proceedings of the 5th ACM Multimedia Systems Conference
A Study of Linux File System Evolution
ACM Transactions on Storage (TOS)
A three-phase energy-saving strategy for cloud storage systems
Journal of Systems and Software
Google hostload prediction based on Bayesian model with optimized feature combination
Journal of Parallel and Distributed Computing
MORM: A Multi-objective Optimized Replication Management strategy for cloud storage cluster
Journal of Systems Architecture: the EUROMICRO Journal
The Journal of Supercomputing
A multi-dimensional index structure based on improved VA-file and CAN in the cloud
International Journal of Automation and Computing
X10-FT: Transparent fault tolerance for APGAS language and runtime
Parallel Computing
ORTHRUS: a lightweighted block-level cloud storage system
Cluster Computing
Scalable Metadata Management Through OSD+ Devices
International Journal of Parallel Programming
A study of Linux file system evolution
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
HARDFS: hardening HDFS with selective and lightweight versioning
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Horus: fine-grained encryption-based security for large-scale storage
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Analysis of HDFS under HBase: a facebook messages case study
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
GPFS-SNC: an enterprise cluster file system for big data
IBM Journal of Research and Development
Exalt: empowering researchers to evaluate large-scale storage systems
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!.