P-found: Grid-enabling distributed repositories of protein folding and unfolding simulations for data mining

Authors:
Martin Swain;Cíndida G. Silva;Nuno Loureiro-Ferreira;Vitaliy Ostropytskyy;João Brito;Olivier Riche;Frederick Stahl;Werner Dubitzky;Rui M. M. Brito
Affiliations:
University of Ulster, Cromore Road, Coleraine BT52 1SA, Northern Ireland, United Kingdom;Chemistry Department, Faculty of Science and Technology, and Center for Neuroscience and Cell Biology, University of Coimbra, 3004-535 Coimbra, Portugal;Chemistry Department, Faculty of Science and Technology, and Center for Neuroscience and Cell Biology, University of Coimbra, 3004-535 Coimbra, Portugal;University of Ulster, Cromore Road, Coleraine BT52 1SA, Northern Ireland, United Kingdom;Critical Software, S.A., Parque Industrial do Taveiro, Lote48, 3045-504 Coimbra, Portugal;University of Ulster, Cromore Road, Coleraine BT52 1SA, Northern Ireland, United Kingdom;University of Ulster, Cromore Road, Coleraine BT52 1SA, Northern Ireland, United Kingdom;University of Ulster, Cromore Road, Coleraine BT52 1SA, Northern Ireland, United Kingdom;Chemistry Department, Faculty of Science and Technology, and Center for Neuroscience and Cell Biology, University of Coimbra, 3004-535 Coimbra, Portugal
Venue:
Future Generation Computer Systems
Year:
2010

Citing 11
Cited 0

NAMD2: greater scalability for parallel molecular dynamics

Journal of Computational Physics - Special issue on computational molecular biophysics
Large scale distributed data repository: design of a molecular dynamics trajectory database

Future Generation Computer Systems
On-Line Analytical Processing on Large Databases Managed by Computational Grids

DEXA '04 Proceedings of the Database and Expert Systems Applications, 15th International Workshop
Performance engineering in data Grids: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
The design and implementation of Grid database services in OGSA-DAI: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
Dynamically Deploying Web Services on a Grid using Dynasoar

ISORC '06 Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
BioSimGrid: grid-enabled biomolecular simulation data storage and analysis

Future Generation Computer Systems - Collaborative and learning applications of grid technology
Distributed data mining services leveraging WSRF

Future Generation Computer Systems - Special section: Data mining in grid computing environments
Grid-enabling data mining applications with DataMiningGrid: An architectural perspective

Future Generation Computer Systems
Data Mining in Grid Computing Environments

Data Mining in Grid Computing Environments
Detection of hydrophobic clusters in molecular dynamics protein unfolding simulations using association rules

ISBMDA'05 Proceedings of the 6th International conference on Biological and Medical Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories - this is an important and challenging aspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.