Open PhD positions
I’m currently looking for potential PhD candidates in the areas below. I prefer to tailor projects to the knowledge and interest of candidates, i.e. the descriptions below are just kind of an inspiration or a starting point before we flesh out a proper PhD project.
Domain-specific languages in ExaHyPE
ExaHyPE is a solver engine for hyperbolic Partial-Differential Equations for complex wave phenomena. It supports multiple numerical methods, including Finite Volumes and ADER-DG, and multi-resolution techniques like Adaptive Mesh Refinement on Space-Filling Curves with Dynamic Load Balancing. The actual physical model within ExaHyPE is currently written in C++ or Fortran by users and then injected into generic (templated) compute kernels. We also offer a SymPy interface which generates the physics from the PDE as plain C code. In this project, we plan to extend and replace this API with a proper domain-specific language (DSL).
At the moment, we have various C++ implementations of the numerics consisting of nested loops, some pre-defined temporary arrays, … The loops realise the numerical scheme and then call back the user code (what do I actually have to compute) at the criticial points. Our DSL will also start from the compute kernels in a generic formulation as a compute/task graph. Into this graph, it inserts the user’s physics. We end up with one huge compute graph describing the whole calculation. From hereon, we run a series of task transformations (to optimise the calculations – we can for example eliminate redundant calculations or ensure that all memory accesses are properly aligned or ensure that we can exploit multiple cores) and then generates plain source code. There are a couple of research questions to be tackled:
- It is not clear how GPUs will have to be programmed in the future. Techniques like SYCL, ISO C++ and OpenMP compete. Therefore, the code generator has to produce for various GPU backends, as well as various numerical schemes such as Finite Differences or ADER-DG.
- Can we derive a systematic comparison of these three approaches from a performance point of view and can a code generator take the characteristics of the backend into account to produce fast code?
- DSL compilers are typically guided via heuristics (for the ordering of data structures and loops, e.g.). Can we replace these heuristics with on-the-fly runtime optimisations?
- Can the optimisation steps be re-integrate into the programming model (notably SYCL) and be made available to a wider community?
- ExaHyPE requires users to phrase their physics in either C++ or Fortran. On the long term, our compiler should accept such code snippets but also support symbolic formulations with SymPy or similar libraries to model a specific problem or hyperbolic PDE.
- Do “standard” compiler optimisations such as common subexpression elimination as we find them in established DSLs enable faster code, or are they disadvantageous on GPUs as they require the storage of intermediate results which increases the register pressure on the accelerator?
- Fast DSLs for hyperbolic problems today rephrase all calculations in tensor notation. This works exclusively for linear waves as they arise in seismology. Can we extract linear subexpressions from astrophysical models and use the fast linear algebra for these subexpressions, while we accept that the overall PDEs of interest are non-linear?
ExaHyPE is currently used to simulate Einstein’s theory of general relativity, as through the simulation of Binary Black Holes and the analysis of its associated gravitational waves. We will use this challenge to benchmark the performance impact of my research. The candidate also should seek collaboration with ExCALIBUR’s DSL project where appropriate.
Current simulation codes invest an enormous amount of work into load balancing. They try to spread work evenly out over the machine to ensure that all resources are reasonably utilised. On modern exascale machines and for modern algorithms, this approach quickly starts to struggle: If the mesh of a simulation changes at runtime or if an algorithm consists of different compute phases, there is no “optimal” resource distribution. Rather than investing into better load balancing, we will investigate the paradigm of flexible computing, where the machine configuration is altered at runtime to suit the work arising.
There are two different direction of travel:
- The next generation of supercomputers might feature smart NICs. These are network cards (and switches) with their own compute power (processors). NVIDIA’s BlueField technology is an example for such machinery. If the network becomes intelligent with its own processors – and in NVIDIA’s case soon GPUs – we can throw compute tasks into the network and leave it to the network to puzzle out where to run them. They can migrate tasks to underutilised cores, e.g. If multiple simulations run on a machine at the same time, the network cards and switches also can make their resources available to the simulation that need the resources most. The association of processors and GPUs follows the system load.
- The next generation of supercomputer nodes will feature an unprecedented number of cores. Many codes at the moment run a fixed number of ranks on each node and each rank has a fixed number of cores at disposal. We propose to make each and every rank see each and every core initially. The ranks per node then can argue with each other who uses which core. Ranks with low load can retreat from cores, while ranks with high compute load can “invade” cores previously used by other ranks.
Both strands will eventually lead into a common set of algorithms, paradigms and code building blocks.