A class of compatible cache consistency protocols and their support by the IEEE futurebus
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A simple load balancing scheme for task allocation in parallel machines
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Improved parallel polynomial division
SIAM Journal on Computing
Parallel polynomial operations on SMPs: an overview
Journal of Symbolic Computation - Special issue on parallel symbolic computation
CABAL: polynomial and power series algebra on a parallel computer
PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
The MAGMA algebra system I: the user language
Journal of Symbolic Computation - Special issue on computational algebra and number theory: proceedings of the first MAGMA conference
The geobucket data structure for polynomials
Journal of Symbolic Computation
Introduction to parallel algorithms
Introduction to parallel algorithms
Parallel programming: techniques and applications using networked workstations and parallel computers
Synchronization with eventcounts and sequencers
Communications of the ACM
Comparing the speed of programs for sparse polynomial multiplication
ACM SIGSAM Bulletin
ACM SIGSAM Bulletin
Multithreaded parallel implementation of arithmetic operations modulo a triangular set
Proceedings of the 2007 international workshop on Parallel symbolic computation
Sparse polynomial division using a heap
Journal of Symbolic Computation
Development of TRIP: fast sparse multivariate polynomial multiplication using burst tries
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Polynomial division using dynamic arrays, heaps, and packed exponent vectors
CASC'07 Proceedings of the 10th international conference on Computer Algebra in Scientific Computing
Code Generation for Polynomial Multiplication
CASC '09 Proceedings of the 11th International Workshop on Computer Algebra in Scientific Computing
Parallel operations of sparse polynomials on multicores: I. multiplication and Poisson bracket
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Parallel sparse polynomial division using heaps
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
TRIP: a computer algebra system dedicated to celestial mechanics and perturbation series
ACM Communications in Computer Algebra
Sparse polynomial multiplication and division in Maple 14
ACM Communications in Computer Algebra
Sparse polynomial division using a heap
Journal of Symbolic Computation
Boosting multi-core reachability performance with shared hash tables
Proceedings of the 2010 Conference on Formal Methods in Computer-Aided Design
Parallel and cache-efficient Hensel lifting
ACM Communications in Computer Algebra
Algorithmic Thomas decomposition of algebraic and differential systems
Journal of Symbolic Computation
On the bit-complexity of sparse polynomial and series multiplication
Journal of Symbolic Computation
Sparse polynomial powering using heaps
CASC'12 Proceedings of the 14th international conference on Computer Algebra in Scientific Computing
POLY: a new polynomial data structure for Maple 17
ACM Communications in Computer Algebra
Parallel sparse polynomial multiplication on modern hardware architectures
Proceedings of the 37th International Symposium on Symbolic and Algebraic Computation
On the complexity of multivariate blockwise polynomial multiplication
Proceedings of the 37th International Symposium on Symbolic and Algebraic Computation
Hi-index | 0.00 |
We present a high performance algorithm for multiplying sparse distributed polynomials using a multicore processor. Each core uses a heap of pointers to multiply parts of the polynomials using its local cache. Intermediate results are written to buffers in shared cache and the cores take turns combining them to form the result. A cooperative approach is used to balance the load and improve scalability, and the extra cache from each core produces a superlinear speedup in practice. We present benchmarks comparing our parallel routine to a sequential version and to the routines of other computer algebra systems.