Grid-aware approach to data statistics, data understanding and data preprocessing

Authors:
Alexander Wohrer;Lenka Novakova;Peter Brezany;A. Min Tjoa
Affiliations:
Institute for Scientific Computing, Faculty of Computer Science, University of Vienna, Nordbergstrasse 15&#/#/47/C&#/#/47/3, 1090 Vienna, Austria.;Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Cybernetics, Technicka 2, 166 27 Prague 6, Czech Republic.;Institute for Scienti&#/#/64257/c Computing, Faculty of Computer Science, University of Vienna, Nordbergstrasse 15&#/#/47/C&#/#/47/3, 1090 Vienna, Austria.;Institute of Software Technology, Vienna University of Technology, Favoritenstr. 9 –/ 11&#/#/47/188, 1040 Wien, Austria
Venue:
International Journal of High Performance Computing and Networking
Year:
2009

Citing 10
Cited 0

The impact of poor data quality on the typical enterprise

Communications of the ACM
Data preparation for data mining

Data preparation for data mining
A Taxonomy of Dirty Data

Data Mining and Knowledge Discovery
Decision Tables: Scalable Classification Exploring RDBMS Capabilities

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Novel mediator architectures for Grid information systems

Future Generation Computer Systems
General purpose database summarization

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A survey of data provenance in e-science

ACM SIGMOD Record
How to summarize the universe: dynamic maintenance of quantiles

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Knowledge grid support for treatment of traumatic brain injury victims

ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartI
Databases in grid applications: locality and distribution

BNCOD'05 Proceedings of the 22nd British National conference on Databases: enterprise, Skills and Innovation

Quantified Score

Hi-index	0.01

Visualization

Abstract

In recent years the focus of grid computing shifted towards more data intensive applications, increasingly needing access to various public and private databases. Relocating the code for Data Preprocessing (DPP) closer towards the data source is the overall task of the D³Gframework. This paper presents the data service side architecture to gather Data Statistics (DS) on-the-fly, use them in remote DPP methods on query results and gather exact continuous DS for whole tables inside a database. The performance results are showing low running costs for the continuous DS and the feasibility of the service side DPP functionality.