Communicating process architecture: transputers and Occam
Proc. of an advanced course on Future parallel computers.
Warp: an integrated solution of high-speed parallel computing
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
The ESTEREL synchronous programming language: design, semantics, implementation
Science of Computer Programming
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
iWarp: anatomy of a parallel computing system
iWarp: anatomy of a parallel computing system
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Communicating sequential processes
Communications of the ACM
Software Synthesis from Dataflow Graphs
Software Synthesis from Dataflow Graphs
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
The Sisal Model of Functional Programming and its Implementation
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
A Technology-Scalable Architecture for Fast Clocks and High ILP
A Technology-Scalable Architecture for Fast Clocks and High ILP
IEEE Transactions on Signal Processing
Phased scheduling of stream programs
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Linear analysis and optimization of stream programs
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Rate analysis for streaming applications with on-chip buffer constraints
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Fast Paths in Concurrent Programs
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
An Application Analysis Framework For Polymorphic Chip Multiprocessors
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Bandwidth Management with a Reconfigurable Data Cache
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Automatically partitioning packet processing applications for pipelined architectures
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Complementing software pipelining with software thread integration
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Teleport messaging for distributed stream programs
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
A reconfigurable architecture for load-balanced rendering
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Physical resource binding for a Coarse-Grain reconfigurable array using evolutionary algorithms
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
Proceedings of the International Symposium on Code Generation and Optimization
Language and compiler design for streaming applications
International Journal of Parallel Programming - Special issue: The next generation software program
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Support for High-Frequency Streaming in CMPs
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A 64-bit stream processor architecture for scientific applications
Proceedings of the 34th annual international symposium on Computer architecture
Comparing memory systems for chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Pipelined Execution of Critical Sections Using Software-Controlled Caching in Network Processors
Proceedings of the International Symposium on Code Generation and Optimization
Hierarchical coarse-grained stream compilation for software defined radio
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Streamware: programming general-purpose multicore processors using streams
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
MAPS: an integrated framework for MPSoC application parallelization
Proceedings of the 45th annual Design Automation Conference
A lightweight streaming layer for multicore execution
ACM SIGARCH Computer Architecture News
Performance scalability of decoupled software pipelining
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the conference on Design, automation and test in Europe
Exact and approximate task assignment algorithms for pipelined software synthesis
Proceedings of the conference on Design, automation and test in Europe
Flask: staged functional programming for sensor networks
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
SoC-C: efficient programming abstractions for heterogeneous multicore systems on chip
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Comparative evaluation of memory models for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Throughput-driven synthesis of embedded software for pipelined execution on multicore architectures
ACM Transactions on Embedded Computing Systems (TECS)
Proceedings of the 4th workshop on Declarative aspects of multicore programming
Cache-aware timing analysis of streaming applications
Real-Time Systems
MPSoC Design Using Application-Specific Architecturally Visible Communication
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Synergistic execution of stream programs on multicores with accelerators
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Software Pipelined Execution of Stream Programs on GPUs
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Stream Compilation for Real-Time Embedded Multicore Systems
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
PLUG: flexible lookup modules for rapid deployment of new protocols in high-speed routers
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
SRF coloring: stream register file allocation via graph coloring
Journal of Computer Science and Technology
Mapping stream programs onto heterogeneous multiprocessor systems
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
A computing origami: folding streams in FPGAs
Proceedings of the 46th Annual Design Automation Conference
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
MacroSS: macro-SIMDization of streaming applications
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An analytical model to exploit memory task scheduling
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Look into details: the benefits of fine-grain streaming buffer analysis
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Streaming networks for coordinating data-parallel programs
PSI'06 Proceedings of the 6th international Andrei Ershov memorial conference on Perspectives of systems informatics
Stream image processing on a dual-core embedded system
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
rMPI: message passing on multicore processors with on-chip interconnect
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Bamboo: a data-centric, object-oriented approach to many-core software
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Versatile task assignment for heterogeneous soft dual-processor platforms
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Partitioning streaming parallelism for multi-cores: a machine learning based approach
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Design and implementation of the PLUG architecture for programmable and efficient network lookups
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The Paralax infrastructure: automatic parallelization with a helping hand
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Embracing heterogeneity: parallel programming for changing hardware
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Lime: a Java-compatible and synthesizable language for heterogeneous architectures
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Compilation of stream programs for multicore processors that incorporate scratchpad memories
Proceedings of the Conference on Design, Automation and Test in Europe
Resource recycling: putting idle resources to work on a composable accelerator
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
A configurable framework for stream programming exploration in baseband applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
MPEG-2 decoding in a stream programming language
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
ReMAP: A Reconfigurable Heterogeneous Multicore Architecture
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Memory Latency Reduction via Thread Throttling
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Streaming Data Movement for Real-Time Image Analysis
Journal of Signal Processing Systems
A structured codesign approach to many-core architectures for embedded systems
SOFSEM'11 Proceedings of the 37th international conference on Current trends in theory and practice of computer science
Sponge: portable stream programming on graphics engines
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Automatic SoC design flow on many-core processors: a software hardware co-design approach for FPGAs
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
A pattern for efficient parallel computation on multicore processors with scalar operand networks
Proceedings of the 2010 Workshop on Parallel Programming Patterns
A programming model for deterministic task parallelism
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Parallel programming of general-purpose programs using task-based programming models
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Evaluation of dynamic voltage and frequency scaling for stream programs
Proceedings of the 8th ACM International Conference on Computing Frontiers
Streaming networks for coordinating data-parallel programs (position statement)
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Scalable framework for mapping streaming applications onto multi-GPU systems
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
TransCom: transforming stream communication for load balance and efficiency in networks-on-chip
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Simulation-based evaluation of the Imagine stream processor with scientific programs
International Journal of High Performance Computing and Networking
Decoupling algorithms from schedules for easy optimization of image processing pipelines
ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings
Postscheduling buffer management trade-offs in streaming software synthesis
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on verification challenges in the concurrent world
ACM Transactions on Embedded Computing Systems (TECS)
StreamX10: a stream programming framework on X10
Proceedings of the 2012 ACM SIGPLAN X10 Workshop
Distributed S-Net: Cluster and Grid Computing without the Hassle
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
From a calculus to an execution environment for stream processing
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Integrating Memory Optimization with Mapping Algorithms for Multi-Processors System-on-Chip
ACM Transactions on Embedded Computing Systems (TECS)
A low-overhead dedicated execution support for stream applications on shared-memory cmp
Proceedings of the tenth ACM international conference on Embedded software
Dynamic scheduling of stream programs on embedded multi-core processors
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Automatic generation of software pipelines for heterogeneous parallel systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Laplace transformation on the FT64 stream processor
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
StreamTMC: Stream compilation for tiled multi-core architectures
Journal of Parallel and Distributed Computing
Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Dynamic expressivity with static optimization for streaming languages
Proceedings of the 7th ACM international conference on Distributed event-based systems
Tutorial: stream processing optimizations
Proceedings of the 7th ACM international conference on Distributed event-based systems
Extending dataflow programs with throughput properties
Proceedings of the First International Workshop on Many-core Embedded Systems
Combining module selection and replication for throughput-driven streaming programs
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
LVars: lattice-based data structures for deterministic parallelism
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Using machine learning to partition streaming programs
ACM Transactions on Architecture and Code Optimization (TACO)
Design-space exploration and runtime resource management for multicores
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
A catalog of stream processing optimizations
ACM Computing Surveys (CSUR)
Freeze after writing: quasi-deterministic parallel programming with LVars
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Combining computation and communication optimizations in system synthesis for streaming applications
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Clustering scheduling for hardware tasks in reconfigurable computing systems
Journal of Systems Architecture: the EUROMICRO Journal
Integrating profile-driven parallelism detection and machine-learning-based mapping
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, SmartMemories, TRIPS). However, for their use to be widespread, it will be necessary to develop compiler technology that enables a portable, high-level language to execute efficiently across a range of wire-exposed architectures.In this paper, we describe our compiler for StreamIt: a high-level, architecture-independent language for streaming applications. We focus on our backend for the Raw processor. Though StreamIt exposes the parallelism and communication patterns of stream programs, some analysis is needed to adapt a stream program to a software-exposed processor. We describe a partitioning algorithm that employs fission and fusion transformations to adjust the granularity of a stream graph, a layout algorithm that maps a stream graph to a given network topology, and a scheduling strategy that generates a fine-grained static communication pattern for each computational element.We have implemented a fully functional compiler that parallelizes StreamIt applications for Raw, including several load-balancing transformations. Using the cycle-accurate Raw simulator, we demonstrate that the StreamIt compiler can automatically map a high-level stream abstraction to Raw without losing performance. We consider this work to be a first step towards a portable programming model for communication-exposed architectures.