A High-Performance, Pipelined, FPGA-Based Genetic Algorithm Machine

  • Authors:
  • Barry Shackleford;Greg Snider;Richard J. Carter;Etsuko Okushi;Mitsuhiro Yasuda;Katsuhiko Seo;Hiroto Yasuura

  • Affiliations:
  • Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA 94304 U.S.A. barry_shackleford@hpl.hp.com;Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA 94304 U.S.A. greg_snider@hpl.hp.com;Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA 94304 U.S.A. dick_carter@hpl.hp.com;Mitsubishi Electric Corporation, 5-5-1, Ofuna, Kamakura, Kanagawa 247-8501 Japan etsuko@dsec.hq.melco.co.jp;Mitsubishi Electric Corporation, 5-5-1, Ofuna, Kamakura, Kanagawa 247-8501 Japan yasuda@dsec.hq.melco.co.jp;Mitsubishi Electric Corporation, 5-5-1, Ofuna, Kamakura, Kanagawa 247-8501 Japan seo@dsec.hq.melco.co.jp;Kyushu University, Kasuga-shi 816 Japan yasuura@c.csce.kyushu-u.ac.jp

  • Venue:
  • Genetic Programming and Evolvable Machines
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Accelerating a genetic algorithm (GA) by implementing it in a reconfigurable field programmable gate array (FPGA) is described. The implemented GA features: random parent selection, which conserves selection circuitry; a steady-state memory model, which conserves chip area; survival of fitter child chromosomes over their less-fit parent chromosomes, which promotes evolution. A net child chromosome generation rate of one per clock cycle is obtained by pipelining the parent selection, crossover, mutation, and fitness evaluation functions. Complex fitness functions can be further pipelined to maintain a high-speed clock cycle. Fitness functions with a pipeline initiation interval of greater than one can be plurally implemented to maintain a net evaluated-chromosome throughput of one per clock cycle. Two prototypes are described: The first prototype (c. 1996 technology) is a multiple-FPGA chip implementation, running at a 1 MHz clock rate, that solves a 94-row × 520-column set covering problem 2,200× faster than a 100 MHz workstation running the same algorithm in C. The second prototype (Xilinx XVC300) is a single-FPGA chip implementation, running at a 66 MHZ clock rate, that solves a 36-residue protein folding problem in a 2-d lattice 320× faster than a 366 MHz Pentium II. The current largest FPGA (Xilinx XCV3200E) has circuitry available for the implementation of 30 fitness function units which would yield an acceleration of 9,600× for the 36-residue protein folding problem.