A grid-based approach for enterprise-scale data mining

Authors:
Ramesh Natarajan;Radu Sion;Thomas Phan
Affiliations:
IBM Thomas J. Watson Research Center, Yorktown Heights, NY;Department of Computer Science, State University of New York, Stonybrook, NY;IBM Almaden Research Center, San Jose, CA
Venue:
Future Generation Computer Systems - Special section: Data mining in grid computing environments
Year:
2007

Citing 13
Cited 4

Bagging predictors

Machine Learning
On parallel processing of aggregate and scalar functions in object-relational DBMS

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
MOCHA: a self-extensible database middleware system for distributed data sources

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Cache investment: integrating query optimization and distributed data placement

ACM Transactions on Database Systems (TODS)
Business applications of data mining

Communications of the ACM - Evolving data mining into solutions for insights
Probabilistic Estimation-Based Data Mining for Discovering Insurance Risks

IEEE Intelligent Systems
Computational and data Grids in large-scale science and engineering

Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
Distributed data mining on the grid

Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
Embedded predictive modeling in a parallel relational database

Proceedings of the 2006 ACM symposium on Applied computing
A Framework for Learning from Distributed Data Using Sufficient Statistics and Its Application to Learning Decision Trees

International Journal of Hybrid Intelligent Systems
Cached sufficient statistics for efficient machine learning with large datasets

Journal of Artificial Intelligence Research
A probabilistic estimation framework for predictive modeling analytics

IBM Systems Journal
XG: a data-driven computation grid for enterprise-scale mining

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications

Grid-enabling data mining applications with DataMiningGrid: An architectural perspective

Future Generation Computer Systems
APHID: An architecture for private, high-performance integrated data mining

Future Generation Computer Systems
Global peer-to-peer classification in mobile ad-hoc networks: a requirements analysis

CONTEXT'11 Proceedings of the 7th international and interdisciplinary conference on Modeling and using context
An empirical study on mining sequential patterns in a grid computing environment

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a grid-based approach for enterprise-scale data mining, which is based on leveraging parallel database technology for data storage, and on-demand compute servers for parallelism in the statistical computations. This approach is targeted towards the use of data mining in highly-automated vertical business applications, where the data is stored on one or more relational database systems, and an independent set of high-performance compute servers or a network of low-cost, commodity processors is used to improve the application performance and overall workload management. The goal of this paper is to describe an algorithmic decomposition of data mining kernels between the data storage and compute grids, which makes it possible to exploit the parallelism on the respective grids in a simple way, while minimizing the data transfer between these grids. This approach is compatible with existing standards for data mining task specification and results reporting, so that larger applications using these data mining algorithms do not have to be modified to benefit from this grid-based approach.