ISSAC '94 Proceedings of the international symposium on Symbolic and algebraic computation
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Modern computer algebra
Implementation of Strassen's algorithm for matrix multiplication
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
The truncated fourier transform and applications
ISSAC '04 Proceedings of the 2004 international symposium on Symbolic and algebraic computation
Fast arithmetic for triangular sets: from theory to practice
Proceedings of the 2007 international symposium on Symbolic and algebraic computation
Multithreaded parallel implementation of arithmetic operations modulo a triangular set
Proceedings of the 2007 international workshop on Parallel symbolic computation
Scheduling multithreaded computations by work stealing
SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Computations modulo regular chains
Proceedings of the 2009 international symposium on Symbolic and algebraic computation
Balanced Dense Polynomial Multiplication on Multi-Cores
PDCAT '09 Proceedings of the 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies
The Cilkview scalability analyzer
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Solving bivariate polynomial systems on a GPU
ACM Communications in Computer Algebra
Hi-index | 0.00 |
We report efficient implementation techniques for FFT-based dense multivariate polynomial arithmetic over finite fields, targeting multi-cores. We have extended a preliminary study dedicated to polynomial multiplication and obtained a complete set of efficient parallel routines in Cilk++ for polynomial arithmetic such as normal form computation. Since bivariate multiplication applied to balanced data is a good kernel for these routines, we provide an in-depth study on the performance and the cut-off criteria of our different implementations for this operation. We also show that, not only optimized parallel multiplication can improve the performance of higher-level algorithms such as normal form computation but also this composition is necessary for parallel normal form computation to reach peak performance on a variety of problems that we have tested.