Processing large-scale multi-dimensional data in parallel and distributed environments

Authors:
Michael Beynon;Chialin Chang;Umit Catalyurek;Tahsin Kurc;Alan Sussman;Henrique Andrade;Renato Ferreira;Joel Saltz
Affiliations:
Department of Computer Science, University of Maryland, College Park, MD;Department of Computer Science, University of Maryland, College Park, MD;Department of Biomedical Informatics, The Ohio State University, Columbus, OH;Department of Biomedical Informatics, The Ohio State University, Columbus, OH;Department of Computer Science, University of Maryland, College Park, MD;Department of Computer Science, University of Maryland, College Park, MD;Department of Computer Science, University of Maryland, College Park, MD;Department of Biomedical Informatics, The Ohio State University, Columbus, OH
Venue:
Parallel Computing - Parallel data-intensive algorithms and applications
Year:
2002

Citing 43
Cited 18

Marching cubes: A high resolution 3D surface construction algorithm

SIGGRAPH '87 Proceedings of the 14th annual conference on Computer graphics and interactive techniques
Fundamentals of three-dimensional computer graphics

Fundamentals of three-dimensional computer graphics
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
The Vesta parallel file system

ACM Transactions on Computer Systems (TOCS)
Tuning the performance of I/O-intensive parallel applications

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
The galley parallel file system

ICS '96 Proceedings of the 10th international conference on Supercomputing
The visualization toolkit (2nd ed.): an object-oriented approach to 3D graphics

The visualization toolkit (2nd ed.): an object-oriented approach to 3D graphics
The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
A case for intelligent disks (IDISKs)

ACM SIGMOD Record
Active disks: programming model, algorithms and evaluation

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Performance impact of proxies in data intensive client-server applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication

IEEE Transactions on Parallel and Distributed Systems
Parallel accelerated isocontouring for out-of-core visualization

PVGS '99 Proceedings of the 1999 IEEE symposium on Parallel visualization and graphics
Querying very large multi-dimensional datasets in ADR

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Compiling object-oriented data intensive applications

Proceedings of the 14th international conference on Supercomputing
Adaptive reduction parallelization techniques

Proceedings of the 14th international conference on Supercomputing
MOCHA: a self-extensible database middleware system for distributed data sources

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Declustering using fractals

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Out-of-Core Streamline Visualization on Large Unstructured Meshes

IEEE Transactions on Visualization and Computer Graphics
Fast Algorithms for Removing Atmospheric Effects from Satellite Images

IEEE Computational Science & Engineering
Visualization of Large Data Sets with the Active Data Repository

IEEE Computer Graphics and Applications
Large-Scale Data Visualization Using Parallel Data Streaming

IEEE Computer Graphics and Applications
Out-Of-Core Rendering of Large, Unstructured Grids

IEEE Computer Graphics and Applications
Passion: Optimized I/O for Parallel Applications

Computer
Scalability Analysis of Declustering Methods for Multidimensional Range Queries

IEEE Transactions on Knowledge and Data Engineering
Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions

IEEE Transactions on Parallel and Distributed Systems
Database Mining: A Performance Perspective

IEEE Transactions on Knowledge and Data Engineering
Titan: A High-Performance Remote Sensing Database

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Infrastructure for Building Parallel Database Systems for Multi-Dimensional Data

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Active Storage for Large-Scale Data Mining and Multimedia

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Performance Optimization for Data Intensive Grid Applications

AMS '01 Proceedings of the Third Annual International Workshop on Active Middleware Services
Optimizing Execution of Component-based Applications using Group Instances

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Armada: A Parallel File System for Computational Grids

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
ISTORE: Introspective Storage for Data-Intensive Network Services

HOTOS '99 Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems
dQUOB: Managing Large Data Flows Using Dynamic Embedded Queries

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Parallel Classification for Data Mining on Shared-Memory Multiprocessors

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
ACDS: Adapting Computational Data Streams for High Performance

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Optimizing Retrieval and Processing of Multi-Dimensional Scientific Datasets

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Parallel computing in environment and energy

Sourcebook of parallel computing

Executing multiple pipelined data analysis operations in the grid

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
TeraScope: distributed visual data mining of terascale data sets over photonic networks

Future Generation Computer Systems - iGrid 2002
The MOSIX Direct File System Access Method for Supporting Scalable Cluster File Systems

Cluster Computing
Approximated measures in construction of decision trees from large databases

Design and application of hybrid intelligent systems
Time and space optimization for processing groups of multi-dimensional scientific queries

Proceedings of the 18th annual international conference on Supercomputing
Grid -Based Parallel Data Streaming implemented for the Gyrokinetic Toroidal Code

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Comparing the Performance of High-Level Middleware Systems in Shared and Distributed Memory Parallel Environments

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Cumulvs: Interacting with High-Performance Scientific Simulations, for Visualization, Steering and Fault Tolerance

International Journal of High Performance Computing Applications
Multiple range query optimization with distributed cache indexing

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Fast split selection method and its application in decision tree construction from large databases

International Journal of Hybrid Intelligent Systems - Hybrid Intelligence using rough sets
Principles for designing data-/compute-intensive distributed applications and middleware systems for heterogeneous environments

Journal of Parallel and Distributed Computing
Toward automatic parallelization of spatial computation for computing clusters

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Feature selection strategies for poorly correlated data: correlation coefficient considered harmful

AIKED'08 Proceedings of the 7th WSEAS International Conference on Artificial intelligence, knowledge engineering and data bases
Experiments with in-transit processing for data intensive grid workflows

GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Multiple query scheduling for distributed semantic caches

Journal of Parallel and Distributed Computing
Driving scientific applications by data in distributed environments

ICCS'03 Proceedings of the 2003 international conference on Computational science
Data-Driven power system operations

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part III
High performance computing techniques for scaling image analysis workflows

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.