IEEE Transactions on Computers
A scalable and flexible data synchronization scheme for embedded HW-SW shared-memory systems
Proceedings of the 14th international symposium on Systems synthesis
A 100-GOPS Programmable Processor for Vehicle Vision Systems
IEEE Design & Test
Using thread-level speculation to simplify manual parallelization
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
A low-overhead coherence solution for multiprocessors with private cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
IEEE Transactions on Parallel and Distributed Systems
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Low-power network-on-chip for high-performance SoC design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
DSD '06 Proceedings of the 9th EUROMICRO Conference on Digital System Design
Queue - Computer Architecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Solutions for Real Chip Implementation Issues of NoC and Their Application to Memory-Centric NoC
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
The Power of Priority: NoC Based Distributed Cache Coherency
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Streaming consistency: a model for efficient MPSoC design
DSD '07 Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools
Partitioning Multi-Threaded Processors with a Large Number of Threads
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
A Robotic Sentry For Korea's Demilitarized Zone
IEEE Spectrum
Thread-parallel MPEG-2, MPEG-4 and H.264 video encoders for SoC multi-processor architectures
IEEE Transactions on Consumer Electronics
Hi-index | 0.00 |
For mobile intelligent robot applications, an 81.6 GOPS object recognition processor is implemented. Based on an analysis of the target application, the chip architecture and hardware features are decided. The proposed processor aims to support both task-level and data-level parallelism. Ten processing elements are integrated for the task-level parallelism and single instruction multiple data (SIMD) instruction is added to exploit the data-level parallelism. The Memory-Centric network-on-chip7 (NoC) is proposed to support efficient pipelined task execution using the ten processing elements. It also provides coherence and consistency schemes tailored for 1-to-N and M-to-1 data transactions in a task-level pipeline. For further performance gain, the visual image processing memory is also implemented. The chip is fabricated in a 0.18-µm CMOS technology and computes the key-point localization stage of the SIFT object recognition twice faster than the 2.3 GHz Core 2 Duo processor.