Extreme data mining

Authors:
Sridhar Ramaswamy
Affiliations:
Google Inc., Mountain View, CA, USA
Venue:
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Year:
2008

Citing 0
Cited 3

Cloud Computing Boosts Business Intelligence of Telecommunication Industry

CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Distributed threshold querying of general functions by a difference of monotonic representation

Proceedings of the VLDB Endowment
Federated cloud-based big data platform in telecommunications

Proceedings of the 2012 workshop on Cloud services, federation, and the 8th open cirrus summit

Quantified Score

Hi-index	0.00

Visualization

Abstract

At Google, the quality and speed of statistical data mining algorithms directly affects the usefulness of our search results and the relevance of our targeted advertising. One of the things that makes planet-wide, high throughput, 24/7 data mining so interesting is that all parts of the software stack are involved. This talk will walk up the stack, from the physical machines in warehouse-sized data centers, through networking and secondary storage abstractions to the distributed numerical methods and high throughput training and serving algorithms needed to support online logs processing and machine learning. We will also discuss the significant infrastructure and algorithmic impacts of batch versus online training: both data mining modes have essential roles in Google.