High-performance computer architecture
High-performance computer architecture
Performance of the VAX-11/780 translation buffer: simulation and measurement
ACM Transactions on Computer Systems (TOCS)
Concepts and Notations for Concurrent Programming
ACM Computing Surveys (CSUR)
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
A communication structure for a multiprocessor computer with distributed global memory
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
The extension of object-oriented languages to a homogeneous, concurrent architecture
The extension of object-oriented languages to a homogeneous, concurrent architecture
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Optimal replacements in caches with two miss costs
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
On page migration and other relaxed task systems
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
IEEE Transactions on Parallel and Distributed Systems
On page migration and other relaxed task systems
Theoretical Computer Science
Design and analysis of static memory management policies for CC-NUMA Multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Pipelined Data Parallel Algorithms-II: Design
IEEE Transactions on Parallel and Distributed Systems
Quantifying contention and balancing memory load on hardware DSM multiprocessors
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
FELI: HW/SW support for on-chip distributed shared memory in multicores
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Hi-index | 14.98 |
A mechanism called the pivot mechanism is introduced and described. It controls the dynamic migration of data pages between neighboring memory modules during program execution to improve the performance and programmability of multiprocessors with distributed global memory. The programmer or compiler is relieved from the data allocation task; moreover, because data allocation is dynamically modified to minimize communication traffic, algorithms with varying and unpredictable data access patterns can run efficiently. Flexible data migration serves the dual purpose of making algorithms the efficient machine-specific and making possible the efficient execution of algorithms for which a good static allocation is not possible. Simulation results based on a mesh-connected multiprocessor performing a matrix multiplication are presented.