A bridging model for parallel computation
Communications of the ACM
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Entering the petaflop era: the architecture and performance of Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Neural Network Implementation Using CUDA and OpenMP
DICTA '08 Proceedings of the 2008 Digital Image Computing: Techniques and Applications
Scaling analysis of a neocortex inspired cognitive model on the Cray XD1
The Journal of Supercomputing
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Automatic abstraction and fault tolerance in cortical microachitectures
Proceedings of the 38th annual international symposium on Computer architecture
Design space exploration towards a realtime and energy-aware GPGPU-based analysis of biosensor data
Computer Science - Research and Development
Hi-index | 0.00 |
As the number of devices available per chip continues to increase, the computational potential of future computer architectures grows likewise. While this is a clear benefit for future computing devices, future chips will also likely suffer from more faulty devices and increased power consumption. It is also likely that these chips will be difficult to program if the current trend of adding more parallel cores continues to follow in the future. However, recent advances in neuroscientific understanding make parallel computing devices modeled after the human neocortex a plausible, attractive, fault-tolerant, and energy-efficient possibility. In this paper we describe a GPGPU extension to an intelligent model based on the mammalian neocortex. The GPGPU is a readily-available architecture that fits well with the parallel cortical architecture inspired by the basic building blocks of the human brain. Using NVIDIA's CUDA framework, we have achieved up to 273x speedup over our unoptimized C++ serial implementation. We also consider two inefficiencies inherent to our initial design: multiple kernel-launch overhead and poor utilization of GPGPU resources. We propose using a software work-queue structure to solve the former, and pipelining the cortical architecture during training phase for the latter. Additionally, from our success in extending our model to the GPU, we speculate the necessary hardware requirements for simulating the computational abilities of mammalian brains.