Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
HPCVIEW: A Tool for Top-down Analysis of Node Performance
The Journal of Supercomputing
Parallel data intensive computing in scientific and commercial applications
Parallel Computing - Parallel data-intensive algorithms and applications
International Journal of Parallel Programming
A quantitative analysis of the speedup factors of FPGAs over processors
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Partitioning Hardware and Software for Reconfigurable Supercomputing Applications: A Case Study
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Scalable Hybrid Designs for Linear Algebra on Reconfigurable Computing Systems
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Reconfigurable Computing: Accelerating Computation with Field-Programmable Gate Arrays
Reconfigurable Computing: Accelerating Computation with Field-Programmable Gate Arrays
Simulating data processing for an advanced ion mobility mass spectrometer
HPRCTA '07 Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07
Hardware-accelerated components for hybrid computing systems
Proceedings of the 2008 compFrame/HPC-GECO workshop on Component based high performance
Integrating acceleration devices using CometCloud
Proceedings of the first ACM workshop on Optimization techniques for resources management in clouds
Hi-index | 0.00 |
We have conducted a detailed study to understand the po-tential of hybrid CPU/FPGA High-Performance Computers for improving the performance of data-intensive, scientific applications. In particular, we have focused on an application in proteomics (Polygraph), which is representative of many types of computational analysis applications in the lifesciences: it focuses on extracting useful information from a large body of experimentally collected data (identifying ob-served peptide spectra collected from a mass spectrometer against a well-known protein database). Our preliminary analysis of Polygraph found that morethan half (51%) of the computation time was spent in three routines. We have implemented an FPGA version of themost computationally-intensive routine (20% of the time)on a Cray XD-1 system, and measured the overall speed up achieved in comparison to an optimized software version ofthe routine running on the Cray XD-1's native Opteron processors. We have achieved computational speedups of up to9.16. When we include data movement costs, the overall speedup is reduced to 1.78. We discuss the design and implementation strategies thatled to these results, as well as advantages and limitations we found on the Cray XD-1 platform. We also addressthe advantages and limitations of current development environments, as well as discuss relevant issues we found in our experience as hybrid CPU/FPGA programming model "users".