Practical prefetching via data compression
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A study of integrated prefetching and caching strategies
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Informed prefetching and caching
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
ACM Transactions on Computer Systems (TOCS)
Automatic compiler-inserted I/O prefetching for out-of-core applications
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
A trace-driven comparison of algorithms for parallel prefetching and caching
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Informed multi-process prefetching and caching
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Automatic I/O hint generation through speculative execution
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
A cost-benefit scheme for high performance predictive prefetching
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Near-Optimal Parallel Prefetching and Caching
SIAM Journal on Computing
ARIMA time series modeling and forecasting for adaptive I/O prefetching
ICS '01 Proceedings of the 15th international conference on Supercomputing
Optimal prefetching and caching for parallel I/O sytems
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Practical prefetching techniques for parallel file systems
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Integrated prefetching and caching in single and parallel disk systems
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
The Multics Input/Output system
SOSP '71 Proceedings of the third ACM symposium on Operating systems principles
Fido: A Cache That Learns To Fetch
Fido: A Cache That Learns To Fetch
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
The Impact of Performance Asymmetry in Emerging Multicore Architectures
Proceedings of the 32nd annual international symposium on Computer Architecture
Processor Power Reduction Via Single-ISA Heterogeneous Multi-Core Architectures
IEEE Computer Architecture Letters
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Network I/O Acceleration in Heterogeneous Multicore Processors
HOTI '06 Proceedings of the 14th IEEE Symposium on High-Performance Interconnects
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Reducing file system latency using a predictive approach
USTC'94 Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1
Implementation and performance of application-controlled file caching
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
FlexiCache: a flexible interface for customizing Linux file system buffer cache replacement policies
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms
IEEE Transactions on Computers
A Flexible Heterogeneous Multi-Core Architecture
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
An Effective Strategy for Porting C++ Applications on Cell
ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
CellSort: high performance sorting on the cell processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Cell broadband engine architecture and its first implementation: a performance view
IBM Journal of Research and Development
Vectorized data processing on the cell broadband engine
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Balancing productivity and performance on the cell broadband engine
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
FFTC: fastest Fourier transform for the IBM cell broadband engine
HiPC'07 Proceedings of the 14th international conference on High performance computing
Vector stream processing for effective application of heterogeneous parallelism
Proceedings of the 2009 ACM symposium on Applied Computing
Supporting MapReduce on large-scale asymmetric multi-core clusters
ACM SIGOPS Operating Systems Review
Carbon nanotube coated high-throughput neurointerfaces in assistive environments
Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments
Designing Accelerator-Based Distributed Systems for High Performance
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A Capabilities-Aware Programming Model for Asymmetric High-End Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A capabilities-aware framework for using computational accelerators in data-intensive computing
Journal of Parallel and Distributed Computing
Reusable software components for accelerator-based clusters
Journal of Systems and Software
Scalable heterogeneous parallelism for atmospheric modeling and simulation
The Journal of Supercomputing
Analysis of task offloading for accelerators
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
Recent advent of the asymmetric multi-core processors such as Cell Broadband Engine (Cell/BE) has popularized the use of heterogeneous architectures. A growing body of research is exploring the use of such architectures, especially in High-End Computing, for supporting scientific applications. However, prior research has focused on use of the available Cell/BE operating systems and runtime environments for supporting compute-intensive jobs. Data and I/O intensive workloads have largely been ignored in this domain. In this paper, we take the first steps in supporting I/O intensive workloads on the Cell/BE and deriving guidelines for optimizing the execution of I/O workloads on heterogeneous architectures. We explore various performance enhancing techniques for such workloads on an actual Cell/BE system. Among the techniques we explore, an asynchronous prefetching-based approach, which uses the PowerPC core of the Cell/BE for file prefetching and decentralized DMAs from the synergistic processing cores (SPE's), improves the performance for I/O workloads that include an encryption/decryption component by 22.2%, compared to I/O performed naïvely from the SPE's. Our evaluation shows promising results and lays the foundation for developing more efficient I/O support libraries for multi-core asymmetric architectures.