Massive data set issues in air pollution modelling

  • Authors:
  • Zahari Zlatev

  • Affiliations:
  • National Environmental Research Institute Frederiksborgvej 399, P. O. Box 358 DK-4000 Roskilde, Denmark

  • Venue:
  • Handbook of massive data sets
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Air pollution, especially the reduction of the air pollution to some acceptable levels, is a highly relevant environmental problem, which is becoming more and more important. This problem can successfully be studied only when high-resolution comprehensive mathematical models are developed and used on a routine basis. However, such models are very time-cousuming, even when modern high-speed computers are available. Indeed, if an air pollution model is to be applied on a large space domain by using fine grids, then its discretization will always lead to huge computational problems. Assume, for example, that the space domain is discretized by using a (480×480) grid and that the number of chemical species studied by the model is 35. Then ODE systems containing 8064000 equations have to be treated at every time-step (the number of time-steps being typically several thousand). If a three-dimensional version of the air pollution model is to be used, then the above quantity must be multiplied by the number of layers. Moreover, hundreds and even thousands of simulation runs have to be carried out in most of the studies related to policy making. Therefore, it is extremely difficult to treat such large computational problems. This is true even when the fastest computers that are available at present are used. The computing time needed to run such a model causes, of course, great difficulties. However, there is another difficulty which is at least as important as the problem with the computing time. The models need a great amount of input data (meteorological, chemical and emission data). Furthermore, the model produces huge files of output data, which have to be stored for future uses (for visualization and animation of the results). Finally, huge sets of measurement data (normally taken at many stations located in different countries) have to be used in the efforts to validate the model results. The necessity to handle efficiently huge data sets, containing input data, output data and measurement data, will be discussed in this paper.