Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Recommendation Systems: A Probabilistic Analysis
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Hi-index | 0.00 |
In this talk we review and survey some recent work and work in progress on data mining and web search. We discuss Latent Semantic Analysis and give conditions under which it is robust. We also consider the problem of collaborative filtering and show how spectral techniques can give a rigorous and robust justification for doing so. We consider the problems of web search and show how both Google and Klienberg's algorithm are robust under a model of web generation, and how this model can be reasonably extended. We then give an algorithm that provably gives the correct result in this extended model. The results surveyed are joint work with Azar, Karlin, McSherry and Saia [2], and Achlioptas, Karlin and McSherry [1].