Learning and Design of Principal Curves

  • Authors:
  • Balázs Kégl;Adam Krzyzak;Tamás Linder;Kenneth Zeger

  • Affiliations:
  • Queen's Univ., Kingston, Ont., Canada;Concordia Univ., Montreal, Canada;Queen's Univ., Kingston, Ont., Canada;Univ. of California, San Diego, La Jolla

  • Venue:
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • Year:
  • 2000

Quantified Score

Hi-index 0.15

Visualization

Abstract

Principal curves have been defined as 驴self-consistent驴 smooth curves which pass through the 驴middle驴 of a d-dimensional probability distribution or data cloud. They give a summary of the data and also serve as an efficient feature extraction tool. We take a new approach by defining principal curves as continuous curves of a given length which minimize the expected squared distance between the curve and points of the space randomly chosen according to a given distribution. The new definition makes it possible to theoretically analyze principal curve learning from training data and it also leads to a new practical construction. Our theoretical learning scheme chooses a curve from a class of polygonal lines with $k$ segments and with a given total length to minimize the average squared distance over $n$ training points drawn independently. Convergence properties of this learning scheme are analyzed and a practical version of this theoretical algorithm is implemented. In each iteration of the algorithm, a new vertex is added to the polygonal line and the positions of the vertices are updated so that they minimize a penalized squared distance criterion. Simulation results demonstrate that the new algorithm compares favorably with previous methods, both in terms of performance and computational complexity, and is more robust to varying data models.