Optimizing communication for massively parallel processing
Optimizing communication for massively parallel processing
Achieving high performance on extremely large parallel machines: performance prediction and load balancing
Accurate and efficient filtering for the Intel thread checker race detector
Proceedings of the 1st workshop on Architectural and system support for improving software dependability
Using Valgrind to detect undefined value errors with bit-precision
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
IBM Journal of Research and Development
Debugging large scale applications in a virtualized environment
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Hi-index | 0.00 |
Many scientific applications are logically decomposed into modules. each module performing a different type of computation. These modules are then linked together inside the same executable. While these modules are logically independent, they are not physically independent: a faulty module can corrupt the state of another one. By identifying the different modules inside an application, tagging the memory according to the different modules, and performing extra runtime checks, we can automatically detect certain type of errors. We implemented our idea inside the Charm++ runtime system, where modules can be easily identified. We illustrate the validity of our approach, and evaluate its overhead.