Astronomical data processing in EXTASCID

Authors:
Yu Cheng;Florin Rusu
Affiliations:
UC Merced, Merced, CA;UC Merced, Merced, CA
Venue:
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Year:
2013

Citing 14
Cited 0

T2: a customizable parallel database for multi-dimensional data

ACM SIGMOD Record
The multidimensional database system RasDaMan

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Titan: A High-Performance Remote Sensing Database

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Efficient Organization of Large Multidimensional Arrays

Proceedings of the Tenth International Conference on Data Engineering
Flexible and efficient IR using array databases

The VLDB Journal — The International Journal on Very Large Data Bases
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
The DataPath system: a data-centric analytic processing engine for large data warehouses

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ArrayStore: a storage manager for complex parallel array processing

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The architecture of SciDB

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
SciHadoop: array-based query processing in Hadoop

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
SciQL: bridging the gap between science and relational DBMS

Proceedings of the 15th Symposium on International Database Engineering & Applications
RAM: a multidimensional array DBMS

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
GLADE: big data analytics made easy

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Data vaults: a symbiosis between database technology and scientific file repositories

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific data have dual structure. Raw data are preponderantly ordered multi-dimensional arrays or sequences while metadata and derived data are best represented as unordered relations. Scientific data processing requires complex operations over arrays and relations. These operations cannot be expressed using only standard linear and relational algebra operators, respectively. Existing scientific data processing systems are designed for a single data model and handle complex processing at the application level. EXTASCID is a complete and extensible system for scientific data processing. It supports both array and relational data natively. Complex processing is handled by a metaoperator that can execute any user code. As a result, EXTASCID can process full scientific workflows inside the system, with minimal data movement and application code. We illustrate the overall process on a real dataset and workflow from astronomy---starting with a set of sky images, the goal is to identify and classify transient astrophysical objects.