FFT-based dense polynomial arithmetic on multi-cores

Authors:
Marc Moreno Maza;Yuzhen Xie
Affiliations:
Ontario Research Centre for Computer Algebra, University of Western Ontario, London, Canada;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge
Venue:
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
Year:
2009

Citing 11
Cited 2

The shape of the Shape Lemma

ISSAC '94 Proceedings of the international symposium on Symbolic and algebraic computation
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Modern computer algebra

Modern computer algebra
Implementation of Strassen's algorithm for matrix multiplication

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
The truncated fourier transform and applications

ISSAC '04 Proceedings of the 2004 international symposium on Symbolic and algebraic computation
Fast arithmetic for triangular sets: from theory to practice

Proceedings of the 2007 international symposium on Symbolic and algebraic computation
Multithreaded parallel implementation of arithmetic operations modulo a triangular set

Proceedings of the 2007 international workshop on Parallel symbolic computation
Scheduling multithreaded computations by work stealing

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Computations modulo regular chains

Proceedings of the 2009 international symposium on Symbolic and algebraic computation
Balanced Dense Polynomial Multiplication on Multi-Cores

PDCAT '09 Proceedings of the 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies

The Cilkview scalability analyzer

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Solving bivariate polynomial systems on a GPU

ACM Communications in Computer Algebra

Quantified Score

Hi-index	0.00

Visualization

Abstract

We report efficient implementation techniques for FFT-based dense multivariate polynomial arithmetic over finite fields, targeting multi-cores. We have extended a preliminary study dedicated to polynomial multiplication and obtained a complete set of efficient parallel routines in Cilk++ for polynomial arithmetic such as normal form computation. Since bivariate multiplication applied to balanced data is a good kernel for these routines, we provide an in-depth study on the performance and the cut-off criteria of our different implementations for this operation. We also show that, not only optimized parallel multiplication can improve the performance of higher-level algorithms such as normal form computation but also this composition is necessary for parallel normal form computation to reach peak performance on a variety of problems that we have tested.