As long as humans write software, there will be software bugs. And as many computational scientists today are orchestrating thousands of threads across massively parallel GPU systems, debugging and correctness tools are key pieces in the programmer’s toolchest. In this seminar, we’ll begin with an overview of runtime error-checking best practices and how to recover from CUDA errors using CUDA-GDB. Next we’ll take a look at the CUDA Compute Sanitizer suite, which contains tools to detect race conditions and memory access errors. We’ll finish with a demonstration with CUDA-GDB on Polaris.
Speakers Bio
Andrew Gontarek is a senior software engineer in the compute debugging tools group at NVIDIA. His primary focus is on the CUDA-GDB product. Prior to Nvidia, Andrew spent ten years at Cray/HPE as part of the Cray Programming Environment team working on HPC focused solutions. At Cray/HPE he spent time in both the debugging tools and compiler groups.