Advanced compiler design and implementation
Advanced compiler design and implementation
A scalable and flexible data synchronization scheme for embedded HW-SW shared-memory systems
Proceedings of the 14th international symposium on Systems synthesis
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
An Architectural Overview of the Programmable Multimedia Processor, TM-1
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Cache aware optimization of stream programs
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Proceedings of the 43rd annual Design Automation Conference
Efficient computation of buffer capacities for cyclo-static dataflow graphs
Proceedings of the 44th annual Design Automation Conference
Decoupling of Computation and Communication with a Communication Assist
DSD '07 Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools
Latency Minimization for Synchronous Data Flow Graphs
DSD '07 Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools
IEEE Transactions on Signal Processing
Hard-real-time scheduling of data-dependent tasks in embedded streaming applications
EMSOFT '11 Proceedings of the ninth ACM international conference on Embedded software
Cache-conscious scheduling of streaming applications
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Hi-index | 0.00 |
Efficient use of the memory hierarchy is critical for achieving high performance in a multiprocessor system-on-chip. An external memory that is shared between processors is a bottleneck in current and future systems. Cache misses and a large cache miss penalty contribute to a low processor utilisation. In this paper, we describe a novel cache optimisation technique to reduce instruction and data cache misses for streaming applications. The instruction and data locality are improved by executing a task multiple times before moving to the next task. Furthermore, we introduce a dataflow model that is used to trade-off the number of cache misses against end-to-end latency and memory usage. For our industrial application, which is a Digital Radio Mondiale receiver, the number of cache misses is reduced with a factor 4.2.