Fast floating-point processing in Common Lisp
ACM Transactions on Mathematical Software (TOMS)
CONS should not CONS its arguments, part II: Cheney on the M.T.A.
ACM SIGPLAN Notices
Storage use analysis and its applications
Proceedings of the first ACM SIGPLAN international conference on Functional programming
An on-the-fly bytecode compiler for Tcl
TCLTK'96 Proceedings of the 4th conference on USENIX Tcl/Tk Workshop, 1996 - Volume 4
Unboxed compilation of floating point arithmetic in a dynamically typed language environment
IFL'02 Proceedings of the 14th international conference on Implementation of functional languages
Optimizing floating point operations in Scheme
Computer Languages
Hi-index | 0.00 |
Typical implementations of dynamically typed languages treat floating-point numbers, or flonums, in a "boxed" form, since those numbers don't fit in a natural machine word if a few bits in the word are reserved for type tags. The naïve implementations allocate every instance of flonums in the heap, thus incur large overhead on numerically intensive computations. Compile-time type inference could eliminate boxing of some flonums, but it would be costly for highly dynamic scripting languages, in which a compiler runs every time a script is executed. We suggest two modified stack machine architectures that avoid heap allocations for most intermediate flonums, and can be relatively easily retrofitted to existing stack-based VMs. The basic idea is to have an arena for intermediate flonums that works as a part of extended stack or as a nursery. Like typical VMs, flonums are tagged pointers that point to native floating-point numbers, but when a new flonum is pushed onto the VM's stack, it actually points to a native floating-point number placed in the arena. Heap allocation only occurs when the flonum pointer needs to be moved to the heap. The two architectures differ in the strategies to manage the arena. We implemented and evaluated those strategies in a Scheme implementation"Gauche." Both strategies showed 30%-140% speed up in numerical computation intensive benchmarks, eliminating 99.8% of heap-allocation of intermediate flonums, with little penalty in non-numerical benchmarks. Profiling showed the speed improvement came from the elimination of flonum allocation and garbage collection.