Kenjiro Taura

University of Tokyo

Title: A Quest for Unified, Global View Parallel Programming Models for Our Future


Developing highly scalable programs on today’s HPC machines is becoming ever more challenging, due to decreasing byte-flops ratio, deepening memory/network hierarchies, and heterogeneity. Programmers need to learn a distinct programming API for each layer of the hierarchy and overcome performance issues at all layers, one at a time, when the underlying high-level principle for performance is in fact fairly common across layers—locality. Future programming models must allow the programmer to express locality and parallelism in high level terms and their implementation should map exposed parallelism onto different layers of the machine (nodes, cores, and vector units) efficiently by concerted efforts of compilers and runtime systems. In this talk, I will argue that a global view task parallel programming model is a promising direction toward this goal that can reconcile generality, programmability, and performance at a high level. I will then talk about our ongoing research efforts with this prospect. They include: MassiveThreads, a lightweight user-level thread package for multicore systems; MassiveThreads/DM, its extension to distributed memory machines; DAGViz, a performance analyzer specifically designed for task parallel programs; and a task-vectorizing compiler that transforms task parallel programs into vectorized and parallelized instructions. I will end by sharing our prospects on how emerging hardware features and fruitful co-design efforts may help achieve the challenging goal.


Kenjiro Taura is a professor at Department of Information and Communication Engineering, University of Tokyo. His major research interest is how to reconcile programmability and performance in parallel and distributed computing. His group’s recent work includes task parallel runtime systems for dynamic load-balancing both for multicore processors and massively parallel computers, a compiler transforming task parallelism into highly efficient vectorized code, a PGAS runtime system supporting dynamic data migration, and an analysis/visualization tool for understanding performance of dynamically load-balanced programs. Other recent work includes parallel/distributed shell, workflow, and large scale data processing systems, quickly deployable distributed file systems. He is a member of ACM, IEEE, and USENIX. He serves as a PC co-chair of this year’s HPDC and served as Programming Systems Area Co-Chair of SC’13.