From algorithm parallelism to instruction-level parallelism: an encode-decode chain using prefix-sum
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
An interaction of coherence protocols and memory consistency models in DSM systems
ACM SIGOPS Operating Systems Review
Characterizing Distributed Shared Memory Performance: A Case Study of the Convex SPP1000
IEEE Transactions on Parallel and Distributed Systems
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
Proceedings of the 25th annual international symposium on Computer architecture
Communications of the ACM
Effects of Multithreading on Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
IEEE Transactions on Computers - Special issue on cache memory and related problems
Hierarchical fuzzy configuration of implementation strategies
Proceedings of the 1999 ACM symposium on Applied computing
Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance prediction of large parallel applications using parallel simulations
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
A system-level specification framework for I/O architectures
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Optimal replacements in caches with two miss costs
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
IEEE Transactions on Parallel and Distributed Systems
A case for user-level dynamic page migration
Proceedings of the 14th international conference on Supercomputing
A Transformation Approach to Derive Efficient Parallel Implementations
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
A no-busy-wait balanced tree parallel algorithmic paradigm
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Location Consistency-A New Memory Model and Cache Consistency Protocol
IEEE Transactions on Computers
System architecture directions for networked sensors
ACM SIGPLAN Notices
Data Locality Exploitation in the Decomposition of Regular Domain Problems
IEEE Transactions on Parallel and Distributed Systems
Performance Metrics for Embedded Parallel Pipelines
IEEE Transactions on Parallel and Distributed Systems
Modulo scheduling for a fully-distributed clustered VLIW architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Program transformation and runtime support for threaded MPI execution on shared-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
An efficient architecture model for systematic design of application-specific multiprocessor SoC
Proceedings of the conference on Design, automation and test in Europe
Hardware prediction for data coherency of scientific codes on DSM
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Is data distribution necessary in OpenMP?
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Exploiting Network Locality for CC-NUMA Multiprocessors
The Journal of Supercomputing
A routing algorithm for the pyramid structures
Proceedings of the 2001 ACM symposium on Applied computing
Fundamental limitations on the use of prefetching and stream buffers for scientific applications
Proceedings of the 2001 ACM symposium on Applied computing
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences
IEEE Transactions on Parallel and Distributed Systems
The trade-off between implicit and explicit data distribution in shared-memory programming paradigms
ICS '01 Proceedings of the 15th international conference on Supercomputing
Optimizing threaded MPI execution on SMP clusters
ICS '01 Proceedings of the 15th international conference on Supercomputing
Scheduling on hierarchical clusters using malleable tasks
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
System architecture directions for networked sensors
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Proceedings of the 38th annual Design Automation Conference
Statistical scalability analysis of communication operations in distributed applications
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Experiments with list ranking for explicit multi-threaded (XMT) instruction parallelism
Journal of Experimental Algorithmics (JEA)
An optimal memory allocation for application-specific multiprocessor system-on-chip
Proceedings of the 14th international symposium on Systems synthesis
MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - System Level Design
A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers)
IEEE Transactions on Computers
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Four-Ary Tree-Based Barrier Synchronization for 2D Meshes without Nonmember Involvement
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
IEEE Transactions on Computers
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
IEEE Transactions on Computers
Reducing coherence overhead of barrier synchronization in software DSMs
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Full-system timing-first simulation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Automatic generation of embedded memory wrapper for multiprocessor SoC
Proceedings of the 39th annual Design Automation Conference
Component-based design approach for multicore SoCs
Proceedings of the 39th annual Design Automation Conference
Performance characteristics of the SPEC OMP2001 benchmarks
ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Addressing blocking and scalability in critical channel traversing
Proceedings of the sixteenth workshop on Parallel and distributed simulation
A lightweight idempotent messaging protocol for faulty networks
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol
IEEE Transactions on Parallel and Distributed Systems
Unifying memory and processor wrapper architecture in multiprocessor SoC design
Proceedings of the 15th international symposium on System Synthesis
System-level modeling of a network switch SoC
Proceedings of the 15th international symposium on System Synthesis
The sun fireplane system interconnect
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Shared State for Distributed Interactive Data Mining Applications
Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
An Application-Driven Study of Multicast Communication for Write Invalidation
The Journal of Supercomputing
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Performance-steered design of software architectures for embedded multicore systems
Software—Practice & Experience
International Journal of Parallel Programming
Rapid Hardware Prototyping on RPM-2
IEEE Design & Test
The Sun Fireplane Interconnect
IEEE Micro
Impact of Virtual Channels and Adaptive Routing on Application Performance
IEEE Transactions on Parallel and Distributed Systems
Recursive Array Layouts and Fast Matrix Multiplication
IEEE Transactions on Parallel and Distributed Systems
Optimal Remapping in Dynamic Bulk Synchronous Computations via a Stochastic Control Approach
IEEE Transactions on Parallel and Distributed Systems
Parallel ADI solver based on processor scheduling
Applied Mathematics and Computation
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
Parallel simulation of chip-multiprocessor architectures
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Three parallel programming paradigms: comparisons on an archetypal PDE computation
Progress in computer research
A Parallel Hybrid Heuristic for the TSP
Proceedings of the EvoWorkshops on Applications of Evolutionary Computing
Trustless Grid Computing in ConCert
GRID '02 Proceedings of the Third International Workshop on Grid Computing
Simulating Parallel Architectures with BSPlab
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Application-Controlled Coherence Protocols for Scope Consistent Software DSMs
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Feasibility Study of Hierarchical Multithreading
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Realistic Model and an Efficient Heuristic for Scheduling with Heterogeneous Processors
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A New Clustering Algorithm for Large Communication Delays
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Optimal Remapping in Dynamic Bulk Synchronous Computations via a Stochastic Control Approach
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Radar Signal Processing Using Pipelines Optical Hypercube Interconnects
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Performance Analysys of a CC-NUMAOperating System
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Scheduling Parallel Applications Using Malleable Tasks on Clusters
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Performance Improvement for Applications on Parallel Computers
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Optimization of Parallel Algorithms on Cluster of SMP's
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
A Queuing Model of a Multi-threaded Architecture: A Case Study
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
FEM Computations on Clusters Using Different Models of Parallel Programming
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Formal Reasoning about Hardware and Software Memory Models
ICFEM '02 Proceedings of the 4th International Conference on Formal Engineering Methods: Formal Methods and Software Engineering
Benchmarks and Standards for the Evaluation of Parallel Job Schedulers
IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Impact of PE Mapping on Cray T3E Message-Passing Performance
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Implementing Snoop-Coherence Protocol for Future SMP Architectures
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Message Passing Evaluation and Analysis on Cray T3E and SGI Origin 2000 Systems
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Cache Injection: A Novel Technique for Tolerating Memory Latency in Bus-Based SMPs
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Parallel Computation: MM +/- X
Informatics - 10 Years Back. 10 Years Ahead.
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Efficient synchronization for nonuniform communication architectures
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
An empirical performance evaluation of scalable scientific applications
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
High-performance implementation and analysis of the linkmap program
Computers and Biomedical Research
Metacomputing: technology and applications
Highly parallel computaions
The performance advantage of applying compression to the memory system
Proceedings of the 2002 workshop on Memory system performance
A complexity effective communication model for behavioral modeling of signal processing applications
Proceedings of the 40th annual Design Automation Conference
Overview of high performance computers
Handbook of massive data sets
User-controllable coherence for high performance shared memory multiprocessors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compactly representing parallel program executions
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
ARMI: an adaptive, platform independent communication library
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cost-Sensitive Cache Replacement Algorithms
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Hierarchical Backoff Locks for Nonuniform Communication Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Thread-Based Protocol Engines for CC-NUMA Multiprocessors
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Permutation routing in double-loop networks: design and empirical evaluation
Journal of Systems Architecture: the EUROMICRO Journal
The Jrpm system for dynamically parallelizing Java programs
Proceedings of the 30th annual international symposium on Computer architecture
Adaptive and efficient abortable mutual exclusion
Proceedings of the twenty-second annual symposium on Principles of distributed computing
On packet switched networks for on-chip communication
Networks on chip
A parallel computer as a NOC region
Networks on chip
Software for multiprocessor networks on chip
Networks on chip
Sourcebook of parallel computing
Contention-Aware Communication Schedule for High-Speed Communication
Cluster Computing
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Task-level timing models for guaranteed performance in multiprocessor networks-on-chip
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
An alternative routing algorithm for the pyramid structures
Proceedings of the 2003 ACM symposium on Applied computing
An optimal broadcasting schema for multidimensional mesh structures
Proceedings of the 2003 ACM symposium on Applied computing
Network-on-Chip Modeling for System-Level Multiprocessor Simulation
RTSS '03 Proceedings of the 24th IEEE International Real-Time Systems Symposium
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Scheduling divisible workloads on heterogeneous platforms
Parallel Computing - Parallel matrix algorithms and applications (PMAA '02)
On Task Scheduling Accuracy: Evaluation Methodology and Results
The Journal of Supercomputing
Receiving message prediction method
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Exploiting Processor Workload Heterogeneity for Reducing Energy Consumption in Chip Multiprocessors
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Supporting Cache Coherence in Heterogeneous Multiprocessor Systems
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Jacobian-free Newton-Krylov methods: a survey of approaches and applications
Journal of Computational Physics
Mixed level modelling and simulation of large scale HW/SW systems
High performance scientific and engineering computing
Opportunities and challenges in application-tuned circuits and architectures based on nanodevices
Proceedings of the 1st conference on Computing frontiers
A Keyword-Based Semantic Prefetching Approach in Internet News Services
IEEE Transactions on Knowledge and Data Engineering
Packetization and routing analysis of on-chip multiprocessor networks
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Networks on chip
Tuning data replication for improving behavior of MPSoC applications
Proceedings of the 14th ACM Great Lakes symposium on VLSI
FIFO power optimization for on-chip networks
Proceedings of the 14th ACM Great Lakes symposium on VLSI
Proceedings of the 41st annual Design Automation Conference
Design and Optimization of Large Size and Low Overhead Off-Chip Caches
IEEE Transactions on Computers
SMTp: An Architecture for Next-generation Scalable Multi-threading
Proceedings of the 31st annual international symposium on Computer architecture
Parallelism versus memory allocation in pipelined router forwarding engines
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Bi-criteria algorithm for scheduling jobs on cluster platforms
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
IEEE Transactions on Parallel and Distributed Systems
Formalization and strictness of simulation event orderings
Proceedings of the eighteenth workshop on Parallel and distributed simulation
Cluster-Based Partial-Order Reduction
Automated Software Engineering
Performance and modularity benefits of message-driven execution
Journal of Parallel and Distributed Computing
Packetized On-Chip Interconnect Communication Analysis for MPSoC
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Data cache management on EPIC architecture: optimizing memory access for image processing
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Impact of Java Memory Model on Out-of-Order Multiprocessors
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
An architecture and compiler for scalable on-chip communication
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Parallel and Distributed Systems
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
The UCSC Kestrel Parallel Processor
IEEE Transactions on Parallel and Distributed Systems
Radio-wave propagation prediction using ray-tracing techniques on a network of workstations (NOW)
Journal of Parallel and Distributed Computing
NoC Synthesis Flow for Customized Domain Specific Multiprocessor Systems-on-Chip
IEEE Transactions on Parallel and Distributed Systems
Predictable Embedding of Large Data Structures in Multiprocessor Networks-on-Chip
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Flexible Hardware/Software Support for Message Passing on a Distributed Shared Memory Architecture
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks
ISQED '05 Proceedings of the 6th International Symposium on Quality of Electronic Design
Predicting and Evaluating Distributed Communication Performance
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
ParADE: An OpenMP Programming Environment for SMP Cluster Systems
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Distributed Shared Arrays: An Integration of Message Passing and Multithreading on SMP Clusters
The Journal of Supercomputing
Optimizing Array-Intensive Applications for On-Chip Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Exploiting Barriers to Optimize Power Consumption of CMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Message Passing for Linux Clusters with Gigabit Ethernet Mesh Connections
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
A localizing directory coherence protocol
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Cache organizations for clustered microarchitectures
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Seamless Hardware-Software Integration in Reconfigurable Computing Systems
IEEE Design & Test
GAARP: A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks
IEEE Transactions on Computers
Journal of Parallel and Distributed Computing
Cache coherence support for non-shared bus architecture on heterogeneous MPSoCs
Proceedings of the 42nd annual Design Automation Conference
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks
Proceedings of the 32nd annual international symposium on Computer Architecture
Communication Contention in Task Scheduling
IEEE Transactions on Parallel and Distributed Systems
Scheduling malleable tasks with precedence constraints
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Encyclopedia of Computer Science
Encyclopedia of Computer Science
Prediction of communication delay in torus networks under multiple time-scale correlated traffic
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Globalized Newton-Krylov-Schwarz Algorithms and Software for Parallel Implicit CFD
International Journal of High Performance Computing Applications
WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing
International Journal of High Performance Computing Applications
Optimizing inter-processor data locality on embedded chip multiprocessors
Proceedings of the 5th ACM international conference on Embedded software
Characterization of L3 cache behavior of SPECjAppServer2002 and TPC-C
Proceedings of the 19th annual international conference on Supercomputing
affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system
Proceedings of the 19th annual international conference on Supercomputing
Characterization of TCC on Chip-Multiprocessors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Adaptive Parallel Job Scheduling with Flexible Coscheduling
IEEE Transactions on Parallel and Distributed Systems
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A chip prototyping substrate: the flexible architecture for simulation and testing (FAST)
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Fast synchronization for chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Using Dominators to Extract Observable Protocol Contexts
SEFM '05 Proceedings of the Third IEEE International Conference on Software Engineering and Formal Methods
Toward a Realistic Task Scheduling Model
IEEE Transactions on Parallel and Distributed Systems
Power-performance considerations of parallel computing on chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
International Journal of Parallel Programming
Development of process execution rules for workload balancing on agents
Data & Knowledge Engineering - Special issue: Business process management
A modern course on parallel and distributed processing
Journal of Computing Sciences in Colleges
Cache Replacement Algorithms with Nonuniform Miss Costs
IEEE Transactions on Computers
Preemptable Malleable Task Scheduling Problem
IEEE Transactions on Computers
Dynamic partitioning of processing and memory resources in embedded MPSoC architectures
Proceedings of the conference on Design, automation and test in Europe: Proceedings
A survey of research and practices of Network-on-chip
ACM Computing Surveys (CSUR)
SODA: A Low-power Architecture For Software Radio
Proceedings of the 33rd annual international symposium on Computer Architecture
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
An optimal schedule for Gaussian elimination on an MIMD architecture
Journal of Computational and Applied Mathematics
Performance guarantees by simulation of process
SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Optimizing bus energy consumption of on-chip multiprocessors using frequent values
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Reducing energy consumption of multiprocessor SoC architectures by exploiting memory bank locality
ACM Transactions on Design Automation of Electronic Systems (TODAES)
The fat-stack and universal routing in interconnection networks
Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Programming models and HW-SW interfaces abstraction for multi-processor SoC
Proceedings of the 43rd annual Design Automation Conference
Proceedings of the 43rd annual Design Automation Conference
Routing direction determination in regular networks based on configurable circuits
Future Generation Computer Systems
Cache coherence tradeoffs in shared-memory MPSoCs
ACM Transactions on Embedded Computing Systems (TECS)
An approximation algorithm for scheduling malleable tasks under general precedence constraints
ACM Transactions on Algorithms (TALG)
Efficient allocation of distributed object-oriented tasks to a pre-defined scheduled system
International Journal of Computers and Applications
Future networking for scalable I/O
PDCN'06 Proceedings of the 24th IASTED international conference on Parallel and distributed computing and networks
Effects of processing delay on function-parallel firewalls
PDCN'06 Proceedings of the 24th IASTED international conference on Parallel and distributed computing and networks
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Engineering Applications of Artificial Intelligence
Reducing snoop-energy in shared bus-based mpsocs by filtering useless broadcasts
Proceedings of the 17th ACM Great Lakes symposium on VLSI
A design kit for a fully working shared memory multiprocessor on FPGA
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Synthetic traffic generation as a tool for dynamic interconnect evaluation
Proceedings of the 2007 international workshop on System level interconnect prediction
A taxonomy of parallel techniques for intrusion detection
ACM-SE 45 Proceedings of the 45th annual southeast regional conference
The rise and fall of High Performance Fortran: an historical object lesson
Proceedings of the third ACM SIGPLAN conference on History of programming languages
A transparent runtime data distribution engine for OpenMP
Scientific Programming
Improving locality for ODE solvers by program transformations
Scientific Programming
Proceedings of the 4th international conference on Computing frontiers
Efficient self-tuning spin-locks using competitive analysis
Journal of Systems and Software
Comparing memory systems for chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
The Power of Priority: NoC Based Distributed Cache Coherency
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Microprocessors in the era of terascale integration
Proceedings of the conference on Design, automation and test in Europe
Performance aware secure code partitioning
Proceedings of the conference on Design, automation and test in Europe
A Pragmatic Analysis Of Scheduling Environments On New Computing Platforms
International Journal of High Performance Computing Applications
WCAE '98 Proceedings of the 1998 workshop on Computer architecture education
WCAE '00 Proceedings of the 2000 workshop on Computer architecture education
Animations of important concepts in parallel computer architecture
WCAE '07 Proceedings of the 2007 workshop on Computer architecture education
Implicitly parallel programming models for thousand-core microprocessors
Proceedings of the 44th annual Design Automation Conference
Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs
Proceedings of the 44th annual Design Automation Conference
Designer-controlled generation of parallel and flexible heterogeneous MPSoC specification
Proceedings of the 44th annual Design Automation Conference
Distributed texture memory in a multi-GPU environment
GH '06 Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
MOCDEX: multiprocessor on chip multiobjective design space exploration with direct execution
EURASIP Journal on Embedded Systems
Scheduling multiple independent hard-real-time jobs on a heterogeneous multiprocessor
EMSOFT '07 Proceedings of the 7th ACM & IEEE international conference on Embedded software
Aggregating changes to efficiently check consistency
Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting
The AMD Opteron Northbridge Architecture
IEEE Micro
Adaptive Allocation of Independent Tasks to Maximize Throughput
IEEE Transactions on Parallel and Distributed Systems
The impact of wrong-path memory references in cache-coherent multiprocessor systems
Journal of Parallel and Distributed Computing
Broadcast filtering-aware task assignment techniques for low-power MPSoCs
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Wikis: collaborative learning for cs education
Proceedings of the 39th SIGCSE technical symposium on Computer science education
Quality of service (QoS) in internet cache coherence
International Journal of High Performance Computing and Networking
Simulink based hardware-software codesign flow for heterogeneous MPSoC
Proceedings of the 2007 Summer Computer Simulation Conference
Speeding-up multiprocessors running DBMS workloads through coherence protocols
International Journal of High Performance Computing and Networking
HPM: a hierarchical model for parallel computations
International Journal of High Performance Computing and Networking
Minimal broadcasting schemas for the mesh structures
International Journal of High Performance Computing and Networking
Communication between nested loop programs via circular buffers in an embedded multiprocessor system
SCOPES '08 Proceedings of the 11th international workshop on Software & compilers for embedded systems
Fpga-based prototype of a pram-on-chip processor
Proceedings of the 5th conference on Computing frontiers
Processor virtualization for secure mobile terminals
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Journal of Systems Architecture: the EUROMICRO Journal
Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Using supplier locality in power-aware interconnects and caches in chip multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Multi MicroBlaze system for parallel computing
ICC'05 Proceedings of the 9th International Conference on Circuits
Server-based data push architecture for multi-processor environments
Journal of Computer Science and Technology
Parallel programming of multi-processor SoC: a HW-SW interface perspective
International Journal of Parallel Programming - Special Issue on Multiprocessor-based embedded systems
Platform-based software design flow for heterogeneous MPSoC
ACM Transactions on Embedded Computing Systems (TECS)
Concurrent CS: preparing students for a multicore world
Proceedings of the 13th annual conference on Innovation and technology in computer science education
Cache aware mapping of streaming applications on a multiprocessor system-on-chip
Proceedings of the conference on Design, automation and test in Europe
MCjammer: adaptive verification for multi-core designs
Proceedings of the conference on Design, automation and test in Europe
Proposal for LDPC Code Design System Using Multi-Objective Optimization and FPGA-Based Emulation
ICES '08 Proceedings of the 8th international conference on Evolvable Systems: From Biology to Hardware
Exploiting SIMD Parallelism with the CGiS Compiler Framework
Languages and Compilers for Parallel Computing
Algorithm-system scalability of heterogeneous computing
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
Speculative DMA for architecturally visible storage in instruction set extensions
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
CoMPSoC: A template for composable and predictable multi-processor system on chips
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Multiword atomic read/write registers on multiprocessor systems
Journal of Experimental Algorithmics (JEA)
Comparative evaluation of memory models for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
A comparative evaluation of hybrid distributed shared-memory systems
Journal of Systems Architecture: the EUROMICRO Journal
Embedded DSP Processor Design: Application Specific Instruction Set Processors
Embedded DSP Processor Design: Application Specific Instruction Set Processors
Integration, the VLSI Journal
Parallel bioinspired algorithms for NP complete graph problems
Journal of Parallel and Distributed Computing
Preliminary results on nb-feb, a synchronization primitive for parallel programming
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers I
Accelerating critical section execution with asymmetric multi-core architectures
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Java memory model aware software validation
Proceedings of the 8th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
The design of parallel solid voxelization based on multi-processor pipeline by program slicing
ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
MEDYM: match-early with dynamic multicast for content-based publish-subscribe networks
Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
Broadcast filtering: Snoop energy reduction in shared bus-based low-power MPSoCs
Journal of Systems Architecture: the EUROMICRO Journal
Attention driven visual processing for an interactive dialog robot
Proceedings of the 2009 ACM symposium on Applied Computing
Boolean circuit programming: A new paradigm to design parallel algorithms
Journal of Discrete Algorithms
A strategy for parallel sorting algorithms evaluation based on MPI technology
AIKED'09 Proceedings of the 8th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases
A memory system design framework: creating smart memories
Proceedings of the 36th annual international symposium on Computer architecture
Type-Directed Compilation for Multicore Programming
Electronic Notes in Theoretical Computer Science (ENTCS)
What the parallel-processing community has (failed) to offer the multi/many-core generation
Journal of Parallel and Distributed Computing
Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
A tuneable software cache coherence protocol for heterogeneous MPSoCs
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
MPTLsim: a simulator for X86 multicore processors
Proceedings of the 46th Annual Design Automation Conference
A multi-core based parallel streaming mechanism for concurrent video-on-demand applications
IEEE Communications Letters
Reusability-aware cache memory sharing for chip multiprocessors with private L2 caches
Journal of Systems Architecture: the EUROMICRO Journal
Inferno: streamlining verification with inferred semantics
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Experience with building a commodity intel-based ccNUMA system
IBM Journal of Research and Development
Expert Systems with Applications: An International Journal
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Computers and Electrical Engineering
Reevaluating Amdahl's law in the multicore era
Journal of Parallel and Distributed Computing
Optimizing rapidIO architectures for onboard processing
ACM Transactions on Embedded Computing Systems (TECS)
Routing direction determination in regular networks based on configurable circuits
Future Generation Computer Systems
Hardware-based synchronization support for shared accesses in multicore architectures
ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Mesh-of-trees and alternative interconnection networks for single-chip parallelism
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Mitigating the impact of variability on chip-multiprocessor power and performance
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Accurately evaluating application performance in simulated hybrid multi-tasking systems
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Fault-tolerant communication over micronmesh NOC with micron message-passing protocol
SOC'09 Proceedings of the 11th international conference on System-on-chip
International Journal of Reconfigurable Computing
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
The triangular pyramid: Routing and topological properties
Information Sciences: an International Journal
Rapid parameterized model checking of snoopy cache coherence protocols
TACAS'03 Proceedings of the 9th international conference on Tools and algorithms for the construction and analysis of systems
A new technique for data privatization in user-level threads and its use in parallel applications
Proceedings of the 2010 ACM Symposium on Applied Computing
Epitaxial surface growth with local interaction, parallel and non-parallel simulations
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Direct coherence: bringing together performance and scalability in shared-memory multiprocessors
HiPC'07 Proceedings of the 14th international conference on High performance computing
A predictable communication assist
Proceedings of the 7th ACM international conference on Computing frontiers
Fault-tolerant cache coherence protocols for CMPs: evaluation and trade-offs
HiPC'08 Proceedings of the 15th international conference on High performance computing
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Expert Systems with Applications: An International Journal
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Journal of Systems Architecture: the EUROMICRO Journal
An intra-chip free-space optical interconnect
Proceedings of the 37th annual international symposium on Computer architecture
A Low-Latency and Memory-Efficient On-chip Network
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application
Direct pore-level modeling of incompressible fluid flow in porous media
Journal of Computational Physics
Network interface design based on mutual interface definition
International Journal of High Performance Systems Architecture
Replication-aware leakage management in chip multiprocessors with private L2 cache
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Distributed acceleration of mobile radio network optimisation algorithms
WTS'10 Proceedings of the 9th conference on Wireless telecommunications symposium
SWEL: hardware cache coherence protocols to map shared data onto shared caches
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Using simple abstraction to reinvent computing for parallelism
Communications of the ACM
Skewed pipelining for parallel simulink simulations
Proceedings of the Conference on Design, Automation and Test in Europe
CASPAR: hardware patching for multi-core processors
Proceedings of the Conference on Design, Automation and Test in Europe
Revisiting Cramer's rule for solving dense linear systems
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Exploiting compression opportunities to improve SpMxV performance on shared memory systems
ACM Transactions on Architecture and Code Optimization (TACO)
Information Sciences: an International Journal
Contention-aware scheduling with task duplication
Journal of Parallel and Distributed Computing
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Increasing the adaptivity of routing algorithms for k-ary n-cubes
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Algorithm engineering: bridging the gap between algorithm theory and practice
Algorithm engineering: bridging the gap between algorithm theory and practice
Architectural support for thread communications in multi-core processors
Parallel Computing
DeepComp: towards a balanced system design for high performance computer systems
Frontiers of Computer Science in China
Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based Decomposition
Journal of Signal Processing Systems
Performance comparison of some shared memory organizations for 2D mesh-like NOCs
Microprocessors & Microsystems
Scheduling with uncertainties on new computing platforms
Computational Optimization and Applications
DNCOCO'06 Proceedings of the 5th WSEAS international conference on Data networks, communications and computers
EFFEX: an embedded processor for computer vision based feature extraction
Proceedings of the 48th Design Automation Conference
Parallel discrete molecular dynamics simulation with speculation and in-order commitment
Journal of Computational Physics
ADSC: application-driven storage control for energy efficiency
ICT-GLOW'11 Proceedings of the First international conference on Information and communication on technology for the fight against global warming
How to measure useful, sustained performance
State of the Practice Reports
Analytical derivation of traffic patterns in cache-coherent shared-memory systems
Microprocessors & Microsystems
Dynamic parallelization of hydrological model simulations
Environmental Modelling & Software
Scheduling malleable tasks with precedence constraints
Journal of Computer and System Sciences
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Bahurupi: A polymorphic heterogeneous multi-core architecture
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Improving CUDA™ C/C++ encoding readability to foster parallel application development
ACM SIGSOFT Software Engineering Notes
Stochastic allocation and scheduling for conditional task graphs in MPSoCs
CP'06 Proceedings of the 12th international conference on Principles and Practice of Constraint Programming
An independent function-parallel firewall architecture for high-speed networks (short paper)
ICICS'06 Proceedings of the 8th international conference on Information and Communications Security
Toward an application support layer: numerical computation in unified parallel c
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
The potential of on-chip multiprocessing for QCD machines
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Memory subsystem characterization in a 16-core snoop-based chip-multiprocessor architecture
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
An approximation algorithm for scheduling malleable tasks under general precedence constraints
ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Read/Write based fast-path transformation for FCFS mutual exclusion
SOFSEM'05 Proceedings of the 31st international conference on Theory and Practice of Computer Science
A novel lightweight directory architecture for scalable shared-memory multiprocessors
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Empirically efficient verification for a class of infinite-state systems
TACAS'05 Proceedings of the 11th international conference on Tools and Algorithms for the Construction and Analysis of Systems
AGC: adaptive global clock in software transactional memory
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Better speedups using simpler parallel programming for graph connectivity and biconnectivity
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Techniques for the parallelization of unstructured grid applications on multi-GPU systems
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part II
MEDYM: match-early with dynamic multicast for content-based publish-subscribe networks
Middleware'05 Proceedings of the ACM/IFIP/USENIX 6th international conference on Middleware
Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore Architectures
ACM Transactions on Embedded Computing Systems (TECS)
An approach for semiautomatic locality optimizations using OpenMP
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
The Journal of Supercomputing
SCF: A Framework for Task-Level Coordination in Reconfigurable, Heterogeneous Systems
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hardware support for enforcing isolation in lock-based parallel programs
Proceedings of the 26th ACM international conference on Supercomputing
Enhancing effective throughput for transmission line-based bus
Proceedings of the 39th Annual International Symposium on Computer Architecture
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Predicting memcached throughput using simulation and modeling
Proceedings of the 2012 Symposium on Theory of Modeling and Simulation - DEVS Integrative M&S Symposium
Thread vulnerability in parallel applications
Journal of Parallel and Distributed Computing
Virtual computing: the emperor's new clothes?
Software Service and Application Engineering
XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Scenario-based design flow for mapping streaming applications onto on-chip many-core systems
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Parallelizing the ZSWEEP algorithm for distributed-shared memory architectures
VG'01 Proceedings of the 2001 Eurographics conference on Volume Graphics
The Journal of Supercomputing
Extending the BT NAS parallel benchmark to exascale computing
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Support for fine-grained synchronization in shared-memory multiprocessors
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Optimizing software runtime systems for speculative parallelization
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Maintaining consistency in software transactional memory through dynamic versioning tuning
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Preserving the original MPI semantics in a virtualized processor environment
Science of Computer Programming
Improving performance of software transactional memory through contention locality
The Journal of Supercomputing
Synchronizing code execution on ultra-low-power embedded multi-channel signal analysis platforms
Proceedings of the Conference on Design, Automation and Test in Europe
Proceedings of the 50th Annual Design Automation Conference
Exploiting process variability in voltage/frequency control
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Maximum-throughput mapping of SDFGs on multi-core SoC platforms
Journal of Parallel and Distributed Computing
VGTS: variable granularity transactional snoop
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
The reuse cache: downsizing the shared last-level cache
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
High-performance fractal coherence
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
ACM Transactions on Embedded Computing Systems (TECS)
Programming a Multicore Architecture without Coherency and Atomic Operations
Proceedings of Programming Models and Applications on Multicores and Manycores
A queueing theoretic approach for performance evaluation of low-power multi-core embedded systems
Journal of Parallel and Distributed Computing
Removal of Conflicts in Hardware Transactional Memory Systems
International Journal of Parallel Programming
Hi-index | 0.08 |
From the Publisher:This book outlines a set of issues that are critical to all of parallel architecture--communication latency, communication bandwidth, and coordination of cooperative work (across modern designs). It describes the set of techniques available in hardware and in software to address each issues and explore how the various techniques interact.