A comparison of multivariate normal generators
Communications of the ACM
FPGA implementation of neighborhood-of-four cellular automata random number generators
FPGA '02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
A Gaussian Noise Generator for Hardware-Based Simulations
IEEE Transactions on Computers
Sparse Matrix-Vector multiplication on FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Floating-point sparse matrix-vector multiply for FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
A Hardware Gaussian Noise Generator Using the Box-Muller Method and Its Error Analysis
IEEE Transactions on Computers
Energy- and time-efficient matrix multiplication on FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Efficient Hardware Generation of Random Variates with Arbitrary Distributions
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Sampling from the Multivariate Gaussian Distribution using Reconfigurable Hardware
FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A hardware gaussian noise generator using the wallace method
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Acceleration of market value-at-risk estimation
Proceedings of the 2nd Workshop on High Performance Computational Finance
An Optimized Hardware Architecture of a Multivariate Gaussian Random Number Generator
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Design of a financial application driven multivariate gaussian random number generator for an FPGA
ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications
Accelerating Value-at-Risk estimation on highly parallel architectures
Concurrency and Computation: Practice & Experience
Hi-index | 0.00 |
The multivariate Gaussian distribution is often used to model correlations between stochastic time-series, and can be used to explore the effect of these correlations across N time-series in Monte-Carlo simulations. However, generating random correlated vectors is an O(N2) process, and quickly becomes a computational bottleneck in software simulations. This article presents an efficient method for generating vectors in parallel hardware, using N parallel pipelined components to generate a new vector every N cycles. This method maps well to the embedded block RAMs and multipliers in contemporary FPGAs, particularly as extensive testing shows that the limited bit-width arithmetic does not reduce the statistical quality of the generated vectors. An implementation of the architecture in the Virtex-4 architecture achieves a 500MHz clock-rate, and can support vector lengths up to 512 in the largest devices. The combination of a high clock-rate and parallelism provides a significant performance advantage over conventional processors, with an xc4vsx55 device at 500MHz providing a 200 times speedup over an Opteron 2.6GHz using an AMD optimised BLAS package. In a case study in Delta-Gamma Value-at Risk, an RC2000 accelerator card using an xc4vsx55 at 400MHz is 26 times faster than a quad Opteron 2.6GHz SMP.