Fitting algebraic curves to noisy data

  • Authors:
  • Sanjeev Arora;Subhash Khot

  • Affiliations:
  • Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ;Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ

  • Venue:
  • Journal of Computer and System Sciences - STOC 2002
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce the following problem which is motivated by applications in vision and pattern detection: We are given pairs of datapoints (x1,y1), (x2,y2),...,(xm, ym) ∈ [- 1, 1] × [- 1, 1], a noise parameter δ 0, a degree bound d, and a threshold ρ 0. We desire an algorithm that enlists every degree d polynomial h such that |h(xi) - yi| ≤ δ for at least δ fraction of the indices i. If δ = 0, this is just the list decoding problem that has been popular in complexity theory and for which Sudan gave a poly(m,d) time algorithm. However, for δ 0, the problem as stated becomes ill-posed and one needs a careful reformulation (see the Introduction). We prove a few basic results about this (reformulated) problem. We show that the problem has no polynomial-time algorithm (our counterexample works for ρ = 0.5). This is shown by exhibiting an instance of the problem where the number of solutions is as large as exp(d0.5-ε) and every pair of solutions is far from each other in ℓ∞ norm. On the algorithmic side, we give a rigorous analysis of a brute force algorithm that runs in exponential time. Also, in surprising contrast to our lowerbound, we give a polynomial-time algorithm for learning the polynomials assuming the data is generated using a mixture model in which the mixing weights are "nondegenerate."