Extreme data mining

  • Authors:
  • Sridhar Ramaswamy

  • Affiliations:
  • Google Inc., Mountain View, CA, USA

  • Venue:
  • Proceedings of the 2008 ACM SIGMOD international conference on Management of data
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

At Google, the quality and speed of statistical data mining algorithms directly affects the usefulness of our search results and the relevance of our targeted advertising. One of the things that makes planet-wide, high throughput, 24/7 data mining so interesting is that all parts of the software stack are involved. This talk will walk up the stack, from the physical machines in warehouse-sized data centers, through networking and secondary storage abstractions to the distributed numerical methods and high throughput training and serving algorithms needed to support online logs processing and machine learning. We will also discuss the significant infrastructure and algorithmic impacts of batch versus online training: both data mining modes have essential roles in Google.