The UCSC Kestrel Parallel Processor
IEEE Transactions on Parallel and Distributed Systems
Cell-SWat: modeling and scheduling wavefront computations on the cell broadband engine
Proceedings of the 5th conference on Computing frontiers
Array optimizations for high productivity programming languages
Array optimizations for high productivity programming languages
A comparative study and empirical evaluation of global view High performance Linpack program in X10
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Evaluating the performance and scalability of mapreduce applications on X10
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
X10 as a Parallel Language for Scientific Computation: Practice and Experience
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
GPU accelerated smith-waterman
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
A performance model for X10 applications: what's going on under the hood?
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
Using the Cowichan problems to investigate the programmability of X10 programming system
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
X10 implementation of parallel option pricing with BSDE method
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
GPU programming in a high level language: compiling X10 to CUDA
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
Hi-index | 0.00 |
Productivity and performance are always viewed as two sides of parallel programming languages. X10 is a new object-oriented parallel language for both high-productivity and high-performance. To help the development of X10, we characterize the performance of X10 in bioinformatics using the fundamental application Smith-Waterman (SW) sequence database search. We implement the SW application in X10 on multi-core shared-memory architecture. Through comparing with three SW implementations in C++, we make following suggestions for X10 as well as its compiler. (1) X10 compiler should improve its array access implementation in kernel loop to avoid redundant check and inefficient offset computation. The array access of the latest version X10 is much slower than that of C++, which results in poor single-core performance of SW in X10. (2) X10 should support the utilization of SIMD instructions. With 128-bit SSE instructions, SW in X10 can achieve 8.7--17.7 fold speedup. Note that there are many applications in the world which can dramatically benefit from SIMD architectures on modern processors.