Fault tolerance in an industrial seismic processing application for multicore clusters

  • Authors:
  • Alexandre Gonçalves;Matheus Bersot;André Bulcão;Cristina Boeres;Lúcia Drummond;Vinod Rebello

  • Affiliations:
  • Federal Institute of Education, Science and Technology of Rio de Janeiro, Rio de Janeiro, RJ, Brazil;Computer Science Department - Fluminense Federal University, Niterói, RJ, Brazil;Petrobras Research Center (CENPES), Rio de Janeiro, RJ, Brazil;Computer Science Department - Fluminense Federal University, Niterói, RJ, Brazil;Computer Science Department - Fluminense Federal University, Niterói, RJ, Brazil;Computer Science Department - Fluminense Federal University, Niterói, RJ, Brazil

  • Venue:
  • EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Seismic processing applications are used to identify geological structures where reservoirs of oil and gas may be found. With oil companies seeking better precision over larger geographical regions, these applications require larger clusters to keep execution times reasonable. The combination of longer run times and clusters with greater numbers of components increases the probability of faults during the execution. To address this issue, this paper describes an application-level fault tolerance mechanism that considers node crashes and communication link failures. For this industrial application, experiments show that continued execution with the remaining resources is both feasible and efficient.