Fast exact multiplication by the Hessian

  • Authors:
  • Barak A. Pearlmutter

  • Affiliations:
  • -

  • Venue:
  • Neural Computation
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

Just storing the Hessian H (the matrix of second derivativesδ2E/δwiδwj of the error E with respect to eachpair of weights) of a large neural network is difficult. Since acommon use of a large matrix like H is to compute its product withvarious vectors, we derive a technique that directly calculates Hv,where v is an arbitrary vector. To calculate Hv, we first define adifferential operator Rv{f(w)} =(δ/δr)f(w + rv)|r=0, note thatRv{∇w} = Hv and Rv{w} =v, and then apply Rv{·} to the equationsused to compute ∇w. The result is an exact andnumerically stable procedure for computing Hv, which takes about asmuch computation, and is about as local, as a gradient evaluation.We then apply the technique to a one pass gradient calculationalgorithm (backpropagation), a relaxation gradient calculationalgorithm (recurrent backpropagation), and two stochastic gradientcalculation algorithms (Boltzmann machines and weightperturbation). Finally, we show that this technique can be used atthe heart of many iterative techniques for computing variousproperties of H, obviating any need to calculate the fullHessian.