Analyzing analytics

Authors:
Rajesh Bordawekar;Bob Blainey;Chidanand Apte
Affiliations:
IBM Watson Research Center, Yorktown Heights, NY;IBM Toronto Software Lab, Markham, Ontario;IBM Watson Research Center, Yorktown Heights, NY
Venue:
ACM SIGMOD Record
Year:
2014

Citing 12
Cited 0

Airline Schedule Planning: Integrated Models and Algorithms for Schedule Design and Fleet Assignment

Transportation Science
UPS Optimizes Its Air Network

Interfaces
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
On the structural properties of massive telecom call graphs: findings and implications

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Top 10 algorithms in data mining

Knowledge and Information Systems
Review: Application of data mining techniques in customer relationship management: A literature review and classification

Expert Systems with Applications: An International Journal
Competing on Analytics: The New Science of Winning

Competing on Analytics: The New Science of Winning
Enabling analysts in managed services for CRM analytics

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
MapReduce: a flexible data processing tool

Communications of the ACM - Amir Pnueli: Ahead of His Time
Extracting insights from social media with large-scale matrix approximations

IBM Journal of Research and Development
Mining of Massive Datasets

Mining of Massive Datasets
Scaling up Machine Learning: Parallel and Distributed Approaches

Scaling up Machine Learning: Parallel and Distributed Approaches

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many organizations today are faced with the challenge of processing and distilling information from huge and growing collections of data. Such organizations are increasingly deploying sophisticated mathematical algorithms to model the behavior of their business processes to discover correlations in the data, to predict trends and ultimately drive decisions to optimize their operations. These techniques, are known collectively as analytics, and draw upon multiple disciplines, including statistics, quantitative analysis, data mining, and machine learning. In this survey paper, we identify some of the key techniques employed in analytics both to serve as an introduction for the non-specialist and to explore the opportunity for greater optimizations for parallelization and acceleration using commodity and specialized multi-core processors. We are interested in isolating and documenting repeated patterns in analytical algorithms, data structures and data types, and in understanding howthese could be most effectively mapped onto parallel infrastructure. To this end, we focus on analytical models that can be executed using different algorithms. For most major model types, we study implementations of key algorithms to determine common computational and runtime patterns. We then use this information to characterize and recommend suitable parallelization strategies for these algorithms, specifically when used in data management workloads.