Interfaces
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
On the structural properties of massive telecom call graphs: findings and implications
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Top 10 algorithms in data mining
Knowledge and Information Systems
Expert Systems with Applications: An International Journal
Competing on Analytics: The New Science of Winning
Competing on Analytics: The New Science of Winning
Enabling analysts in managed services for CRM analytics
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
Extracting insights from social media with large-scale matrix approximations
IBM Journal of Research and Development
Mining of Massive Datasets
Scaling up Machine Learning: Parallel and Distributed Approaches
Scaling up Machine Learning: Parallel and Distributed Approaches
Hi-index | 0.00 |
Many organizations today are faced with the challenge of processing and distilling information from huge and growing collections of data. Such organizations are increasingly deploying sophisticated mathematical algorithms to model the behavior of their business processes to discover correlations in the data, to predict trends and ultimately drive decisions to optimize their operations. These techniques, are known collectively as analytics, and draw upon multiple disciplines, including statistics, quantitative analysis, data mining, and machine learning. In this survey paper, we identify some of the key techniques employed in analytics both to serve as an introduction for the non-specialist and to explore the opportunity for greater optimizations for parallelization and acceleration using commodity and specialized multi-core processors. We are interested in isolating and documenting repeated patterns in analytical algorithms, data structures and data types, and in understanding howthese could be most effectively mapped onto parallel infrastructure. To this end, we focus on analytical models that can be executed using different algorithms. For most major model types, we study implementations of key algorithms to determine common computational and runtime patterns. We then use this information to characterize and recommend suitable parallelization strategies for these algorithms, specifically when used in data management workloads.