The workshop is held at Noh Theater in the Main Building of Nara Kasugano International Forum.

Thursday, October 6 (Workshop)

Time Presentations / Speakers
08:30 – Registration at Entrance Hall in the Main Building
09:00-09:05 Opening
09:05-09:55 Keynote 1:
Development of system software in post K supercomputer, Mitsuhisa Sato (RIKEN AICS) [Slide]

(Session Chair: Naoya Maruyama, RIKEN AICS)

09:55-10:45 Session 1: Applications (Session Chair: Larry Meadows, Intel)

  • Estimation of round-off errors in OpenMP codes
    Pacôme Eberhart, Julien Brajard, Pierre Fortin, and Fabienne Jézéquel
  • OpenMP Parallelization and Optimization of Graph-based Machine Learning Algorithm
    Zhaoyi Meng, Alice Koniges, Yun (Helen) He, Samuel Williams, Thorsten Kurth, Brandon Cook, Jack Deslippe, and Andrea L. Bertozzi
10:45-11:00 Break
11:00-12:15 Session 2: Locality (Session Chair: Christian Terboven, RWTH Aachen University)

  • Evaluating OpenMP Affinity on the POWER8 Architecture
    Swaroop Pophale and Oscar Hernandez
  • Workstealing and Nested Parallelism in SMP Systems
    Larry Meadows, Simon Pennycook, Alex Duran, Terry Wilmarth, and Jim Cownie
  • Description, Implementation and Evaluation of an affinity clause for task directives
    Philippe Virouleau, Adrien Roussel, François Broquedis, Thierry Gautier, and Fabrice Rastello
12:15-13:45 Lunch (at Cafe Half Time in NARA National Museum in the first basement) [Route]
13:45-15:25 Session 3: Task parallelism 1 (Session Chair: Stephen Olivier, Sandia National Laboratories)

  • NUMA-aware Task Performance Analysis
    Dirk Schmidl and Matthias S. Mueller
  • OpenMP Extension for Explicit Task Allocation on NUMA Architecture
    Jinpil Lee, Keisuke Tsugane, Hitoshi Murai, and Mitsuhisa Sato
  • Approaches for Task Affinity in OpenMP
    Christian Terboven, Jonas Hahnfeld, Xavier Teruel, Sergi Mateo, Alejandro Duran, Michael Klemm, Stephen L. Olivier, and Bronis R. de Supinski
  • Towards Unifying OpenMP under the Task-Parallel Paradigm: Implementation and Performance of the taskloop Construct
    Artur Podobas and Sven Karlsson
15:25-15:40 Break
15:40-16:55 Session 4: Task parallelism 2 (Session Chair: Michael Klemm, Intel)

  • A Case for Extending Task Dependencies
    Thomas Scogland and Bronis de Supinski
  • OpenMP as a High-Level Specification Language for Parallelism
    Max Grossman, Jun Shirako, and Vivek Sardar
  • Scaling FMM with Data-Driven OpenMP Tasks on Multicore Architectures
    Abdelhalim Amer, Satoshi Matsuoka, Miquel Pericas, Naoya Maruyama, Kenjiro Taura, Rio Yokota, and Pavan Balaji
18:00-20:00 Banquet

Friday, October 7 (Workshop)

Time Presentations / Speakers
08:30 – Registration at Entrance Hall in the Main Building
09:00-09:50 Keynote 2:
From the latency to the throughput age, Jesus Labarta (BSC) [Slide]

Abstract: The talk will present a vision of how multicore and new architectures are impacting parallel programming in the high performance context and what we think should be the result of the revolution we are living.
As parallel programmers, we have been writing codes driven by our mental models of the machines we were targeting. The evolution towards increasing complexity, scale and variability in our systems is generating a growing divergence between our mental models of how systems behave and how they actually behave. This generates a feedback loop where programmer productivity, code maintainability and performance portability are severely damnified.
In this context we consider that programming model developments should concentrate efforts in providing clean interfaces for programmers to focus on specifying algorithms, computations and the data they use.  The runtime should take the responsibility of mapping those demands to the available platform, optimizing locality and resource utilization in a very dynamic and responsive way. From this point of view, we consider that the task based models in the direction the OpenMP standard is evolving provide the fundamental mechanisms that support such decoupling of programs from architectural details.
A programming model lacking fundamental mechanisms will certainly result in low quality programs, but having a model that properly supports those mechanisms does not guarantee the ideal result. We claim that the actual revolution has to be an important change in the mindset of programmers. It is our believe that the deep fundamental change that will characterize such revolution is a transition from the still prevalent latency dominated mentality to a throughput oriented mentality that we consider is the key to successfully address the exascale challenge. This will require some time, best practices demonstrations and training, but a quiet revolution is possible.
The talk will elaborate on this vision and present examples of how it drives the OmpSs model and associated runtime developments at BSC.

Speaker Details: Jesus Labarta is full professor on Computer Architecture at the Technical University of Catalonia (UPC) since 1990. Since 1981 he has been lecturing on computer architecture, operating systems, computer networks and performance evaluation. His research interest has been centered on parallel computing, covering areas from multiprocessor architecture, memory hierarchy, programming models, parallelizing compilers, operating systems, parallelization of numerical kernels, performance analysis and prediction tools.
Since 2005 he is responsible of the Computer Science Research Department within the Barcelona Supercomputing Center (BSC). He has been involved in research cooperation with many leading companies on HPC related topics. His major directions of current work relate to performance analysis tools, programming models and resource management. His team distributes the Open Source BSC tools (Paraver and Dimemas) and performs research on increasing the intelligence embedded in the performance analysis tools. He is involved in the development of the OmpSs programming model and its different implementations for SMP, GPUs and cluster platforms. He has been involved in Exascale activities such as IESP and EESI where he has been responsible of the Runtime and Programming model sections of the respective Roadmaps. He leads the programming models and resource management activities in the HPC subproject of the Human Brain Project.

(Session Chair: Bronis R. de Supinski, LLNL)

09:50-10:40 Session 5: Extensions (Session Chair: Alice Koniges, Berkeley Lab / NERSC)

  • Reducing the Functionality Gap between Auto-Vectorization and Explicit Vectorization: Compress/Expand and Histogram
    Hideki Saito, Serguei Preis, Nikolay Panchenko, and Xinmin Tian
  • A Proposal to OpenMP for Addressing the CPU Oversubscription Challenge
    Yonghong Yan, Jeff R. Hammond, Chunhua Liao, and Alexandre E. Eichenberger
10:40-10:50 Break
10:50-12:05 Session 6: Tools (Session Chair: Nawal Copty, Oracle)

  • Testing Infrastructure for OpenMP Debugging Interface Implementations
    Joachim Protze, Dong Ahn, Ignacio Laguna, Martin Schulz, and Matthias Mueller
  • The secrets of the accelerators unveiled: Tracing heterogeneous executions through OMPT
    Germán Llort, Antonio Filgueras, Daniel Jiménez-González, Harald Servat, Xavier Teruel, Estanislao Mercadal, Carlos Álvarez, Judit Giménez, Xavier Martorell, Eduard Ayguadé, and Jesús Labarta
  • Language-Centric Performance Analysis of OpenMP Programs with Aftermath
    Andi Drebes, Jean-Baptiste Bréjon, Antoniu Pop, Karine Heydemann, and Albert Cohen
12:05-13:30 Lunch (at Cafe Half Time in NARA National Museum in the first basement) [Route]
13:30-15:10 Session 7: Accelerator programming (Session Chair: Eric Stotzer, Texas Instruments)

  • Pragmatic Performance Portability with OpenMP 4.x
    Matt Martineau, James Price, Simon McIntosh-Smith, and Wayne Gaudin
  • Multiple Target Task Sharing Support for the OpenMP Accelerator Model
    Guray Ozen, Sergi Mateo, James Beyer, Eduard Ayguade, and Jesus Labarta
  • Early Experiences Porting Three Applications to OpenMP 4.5
    Ian Karlin, Tom Scogland, Arpith C. Jacob, Samuel F. Antao, Gheorghe-Teodor Bercea, Carlo Bertolli, Bronis R. de Supinski, Erik W. Draeger, Alexandre E. Eichenberger, Jim Glosli, Holger Jones, Adam Kunen, David Poliakoff, and David F. Richards
  • Design and Preliminary Evaluation of Omni OpenACC Compiler for Massive MIMD Processor PEZY-SC
    Akihiro Tabuchi, Yasuyuki Kimura, Sunao Torii, Video Matsufuru, Tadashi Ishikawa, Taisuke Boku, and Mitsuhisa Sato
15:10-15:25 Break
15:25-16:40 Session 8: Performance evaluations and optimization (Session Chair: Thomas Scogland, LLNL)

  • Evaluating OpenMP Implementations for Java Using PolyBench
    Xing Fan, Rui Feng, Oliver Sinnen, and Nasser Giacaman
  • Transactional Memory for Algebraic Multigrid Smoothers
    Barna Bihari, Ulrike Yang, Michael Wong, and Bronis R. de Supinski
  • Supporting Adaptive Privatization Techniques for Irregular Array Reductions in Task-parallel Programming Models
    Jan Ciesko, Sergi Mateo, Xavier Teruel, Xavier Martorell, Eduard Ayguade, and Jesus Labarta