Spectral methods for data analysis

  • Authors:
  • Frank Mcsherry;Anna Karlin

  • Affiliations:
  • -;-

  • Venue:
  • Spectral methods for data analysis
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

“Spectral methods” captures generally the class of algorithms which cast their input as a matrix and employ linear algebraic techniques, typically involving the eigenvectors or singular vectors of the matrix. Spectral techniques have had much success in a variety of data analysis domains, from text classification [26] to website ranking [59, 47]. However, little rigorous analysis has been applied to these algorithms, and we are left without a firm understanding of why these approaches work as well as they do. In this thesis, we study the application of spectral techniques to data mining, looking specifically at those problems on which spectral techniques have performed well. We will cast each problem into a common mathematical framework, giving a unified theoretical justification for the empirical success of spectral techniques in these domains. Specifically, we present models that justify the prior empirical success of spectral algorithms for tasks such as object classification, web site ranking, and graph partitioning, as well as new algorithms using these techniques for as of yet underdeveloped data mining tasks such as collaborative filtering. We will then take the understanding from this common framework and use it to unify several spectral results in the random graph literature. Finally, we will study several techniques for extending the practical applicability of these techniques, through computational acceleration, support for incremental calculation, and deployment in a completely decentralized environment.