Register to join our July 21, 2026, webinar, “Improving the Scalability of HPC Applications by Separating Computation from Communication,” to learn how a two-level MPI parallelization approach using one-sided communication can reduce communication bottlenecks, simplify MPI programming, and enable extreme-scale application performance.
Fast progress in computer hardware poses a significant challenge for application developers, as hardware parallelism is increasing much faster than applications can be adapted to use it effectively. When parallelism does not naturally exist in an application, it must be created. As applications scale, they often encounter communication bottlenecks caused by data dependencies among parallel processes.
Separation of computation from communication is a distributed-memory parallelization approach designed to address this problem. This technique uses two-level MPI parallelization, with one level performing intermediate data computation and the other handling data consumption. The model maps naturally to one-sided MPI communication, significantly reducing programming complexity.
Computation begins with all ranks in the first level computing their portion of the intermediate data. Once that work is complete, each rank that consumes the data identifies which rank holds the required data and retrieves it through a one-sided MPI_Get. This lightweight communication layer resolves data dependencies between ranks and completes the MPI communication phase, allowing the final data-consumption phase to proceed without interruption for data retrieval.
By redundantly computing intermediate data across multiple subcommunicators and limiting MPI traffic to local subcommunicators, this approach transforms the quadratically scaling all-to-all communication pattern into a linearly scaling block-diagonal communication matrix. Applied to the Fast Multipole Method, which is known to be communication-bound, this technique reduces communication cost to less than 1% of total time to solution at full machine scale on Aurora, helping prepare the application for zettascale systems.
Victor Anisimov is a Computational Scientist at the Argonne Leadership Computing Facility. He holds a Ph.D. in Physical Chemistry from the Institute of Chemical Physics, Russian Academy of Sciences (1997), followed by five years of computational chemistry software development at Fujitsu, where his team developed the linear-scaling semi-empirical quantum chemistry code LocalSCF. He conducted postdoctoral research at the University of Maryland, Baltimore (2003–2008), and the University of Texas at Houston (2008–2011), where he improved molecular dynamics methods and contributed to the CHARMM code.
From 2011 to 2019, at the National Center for Supercomputing Applications at the University of Illinois Urbana-Champaign, Dr. Anisimov supported petascale resource allocation teams on the Blue Waters supercomputer, optimized a variety of application codes, and improved the performance of the coupled-cluster singles and doubles code in NWChem by a factor of two. He is the co-author, with Dr. James J. P. Stewart, of the textbook Introduction to the Fast Multipole Method. Dr. Anisimov specializes in performance optimization of molecular modeling application codes.