Data mining using high performance data clouds: experimental studies using sector and sphere

Authors:
Robert Grossman;Yunhong Gu
Affiliations:
University of Illinois at Chicago and Open Data Group, Chicago, IL, USA;University of Illinois at Chicago, Chicago, IL, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 8
Cited 20

Using MPI (2nd ed.): portable parallel programming with the message-passing interface

Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
The Grid 2: Blueprint for a New Computing Infrastructure

The Grid 2: Blueprint for a New Computing Infrastructure
Distributing the Sloan Digital Sky Survey Using UDT and Sector

E-SCIENCE '06 Proceedings of the Second IEEE International Conference on e-Science and Grid Computing
UDT: UDP-based data transfer for high-speed wide area networks

Computer Networks: The International Journal of Computer and Telecommunications Networking
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Next Generation of Data Mining

Next Generation of Data Mining

The quest for scalable support of data-intensive workloads in distributed systems

Proceedings of the 18th ACM international symposium on High performance distributed computing
A distributed architecture for data mining and integration

Proceedings of the second international workshop on Data-aware distributed computing
Open standards and cloud computing: KDD-2009 panel report

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A Survey of Cloud Platforms and Their Future

ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part I
Efficient deployment of predictive analytics through open standards and cloud computing

ACM SIGKDD Explorations Newsletter
Design and implementation of a production dynamically configurable testbed

Proceedings of the 2010 TeraGrid Conference
A data placement strategy in scientific cloud workflows

Future Generation Computer Systems
An overview of the Open Science Data Cloud

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
CiteSeerx: a cloud perspective

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Towards energy proportional cloud for data processing frameworks

SustainIT'10 Proceedings of the First USENIX conference on Sustainable information technology
A Compromised-Time-Cost Scheduling Algorithm in SwinDeW-C for Instance-Intensive Cost-Constrained Workflows on a Cloud Computing Platform

International Journal of High Performance Computing Applications
HADI: Mining Radii of Large Graphs

ACM Transactions on Knowledge Discovery from Data (TKDD)
Scatter-Gather-Merge: An efficient star-join query processing algorithm for data-parallel frameworks

Cluster Computing
GBASE: a scalable and general graph management system

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Data cloud for distributed data mining via pipelined mapreduce

ADMI'11 Proceedings of the 7th international conference on Agents and Data Mining Interaction
Systematic approach of using power save mode for cloud data processing services

International Journal of Ad Hoc and Ubiquitous Computing
gbase: an efficient analysis platform for large graphs

The VLDB Journal — The International Journal on Very Large Data Bases
The retrieval of motion event by associations of temporal frequent pattern growth

Future Generation Computer Systems
G-Hadoop: MapReduce across distributed data centers for data-intensive computing

Future Generation Computer Systems
Leveraging the capabilities of service-oriented decision support systems: Putting analytics and big data in cloud

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe the design and implementation of a high performance cloud that we have used to archive, analyze and mine large distributed data sets. By a cloud, we mean an infrastructure that provides resources and/or services over the Internet. A storage cloud provides storage services, while a compute cloud provides compute services. We describe the design of the Sector storage cloud and how it provides the storage services required by the Sphere compute cloud. We also describe the programming paradigm supported by the Sphere compute cloud. Sector and Sphere are designed for analyzing large data sets using computer clusters connected with wide area high performance networks (for example, 10+ Gb/s). We describe a distributed data mining application that we have developed using Sector and Sphere. Finally, we describe some experimental studies comparing Sector/Sphere to Hadoop.