Communications of the ACM - Special issue on computer architecture
The memory gap and the future of high performance memories
ACM SIGARCH Computer Architecture News
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
A loop accelerator for low power embedded VLIW processors
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Cache Refill/Access Decoupling for Vector Machines
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The CSI multimedia architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Vector LLVA: a virtual vector instruction set for media processing
Proceedings of the 2nd international conference on Virtual execution environments
Memory bandwidth optimization through stream descriptors
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
A defect tolerant self-organizing nanoscale SIMD architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A self-organizing defect tolerant SIMD architecture
ACM Journal on Emerging Technologies in Computing Systems (JETC)
A dynamically adaptive DSP for heterogeneous reconfigurable platforms
Proceedings of the conference on Design, automation and test in Europe
A dynamically adaptive DSP for heterogeneous reconfigurable platforms
Proceedings of the conference on Design, automation and test in Europe
An embedded coherent-multithreading multimedia processor and its programming model
Proceedings of the 44th annual Design Automation Conference
Mapping streaming architectures on reconfigurable platforms
ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
Modulo scheduling for highly customized datapaths to increase hardware reusability
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
VEAL: Virtualized Execution Accelerator for Loops
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The input-aware dynamic adaptation of area and performance for reconfigurable accelerator
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Streaming implementation of a sequential decompression algorithm on an FPGA
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
SORU: A Reconfigurable Vector Unit for Adaptable Embedded Systems
ARC '09 Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications
IEEE Transactions on Circuits and Systems for Video Technology
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Streaming Data Movement for Real-Time Image Analysis
Journal of Signal Processing Systems
Navigating big data with high-throughput, energy-efficient data partitioning
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
The need to process multimedia data places largecomputational demands on portable/embedded devices.These multimedia functions share commoncharacteristics: they are computationally intensive anddata-streaming, performing the same operation(s) onmany data elements.The Reconfigurable StreamingVector Processor (RSVPTM) is a vector coprocessorarchitecture that accelerates streaming data operations.Programming the RSVP architecture involves describingthe shape and location of vector streams in memory anddescribing computations as data-flow graphs.Thesedescriptions are intuitive and independent of each other,making the RSVP architecture easy to program.They arealso machine independent, allowing binary-compatibleimplementations with varying cost-performance tradeoffs.This paper presents the RSVP architecture andprogramming model, a programming case study, and ourfirst implementation.Our results show significantspeedups on streaming data functions.Speedups forkernels and applications range from 2 to over 20 timesthat of an ARM9 host processor alone.