ACM Transactions on Programming Languages and Systems (TOPLAS)
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Runtime compilation techniques for data partitioning and communication schedule reuse
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Automatic Compiler-Inserted Prefetching for Pointer-Based Applications
IEEE Transactions on Computers - Special issue on cache memory and related problems
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Compiler analysis of irregular memory accesses
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Run-time and compile-time support for adaptive irregular problems
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Hybrid technology multithreaded architecture
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip
Proceedings of the 3rd conference on Computing frontiers
Efficient emulation of hardware prefetchers via event-driven helper threading
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Proceedings of the 34th annual international symposium on Computer architecture
Executing irregular scientific applications on stream architectures
Proceedings of the 21st annual international conference on Supercomputing
Accelerating and Adapting Precomputation Threads for Effcient Prefetching
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Languages and Compilers for Parallel Computing
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
The Combinatorial BLAS: design, implementation, and applications
International Journal of High Performance Computing Applications
Betweenness centrality: algorithms and implementations
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Understanding parallelism in graph traversal on multi-core clusters
Computer Science - Research and Development
Hi-index | 0.00 |
This paper presents a joint study of application and architecture to improve the performance and scalability of an irregular application--computing betweenness centrality--on a many-core architecture IBM Cyclops64. The characteristics of unstructured parallelism, dynamically non-contiguous memory access, and low arithmetic intensity in betweenness centrality pose an obstacle to an efficient mapping of parallel algorithms on such many-core architectures. By identifying several key architectural features, we propose and evaluate efficient strategies for achieving scalability on a massive multi-threading many-core architecture. We demonstrate several optimization strategies including multi-grain parallelism, just-in-time locality with explicit memory hierarchy and non-preemptive thread execution, and fine-grain data synchronization. Comparing with a conventional parallel algorithm, we get 4X-50X improvement in performance and 16X improvement in scalability on a 128-cores IBM Cyclops64 simulator.