Computational Engineering

Description

This session will feature three talks:

Advancing DNS Turbulent Reacting Flow Simulations with Performance Portability Using OPS • Ashutosh Shankarrao Londhe

This session introduces computational engineering concepts.

SENGA+ is a high-order finite difference compressible direct numerical simulation (DNS) code for simulating turbulent reacting flows. It incorporates detailed chemical reactions and transport with high order numerical schemes to achieve high-fidelity simulations and excellent parallel scaling. This talk presents our experience in deploying SENGA+ on next-generation high-performance computing systems, using the OPS DSL. We discuss the key challenges encountered during the porting process, underpinned by specialized optimizations to gain near-optimal performance on multi-core/many-core systems. OPS enables automatic generation of multiple parallelizations, highly customized for the target architectures, reducing developer effort and increasing code longevity. Performance evaluation shows up to 22× speedup on a GPU node with 4 AMD MI250X compared to the original CPU-only code. We also present validation simulations using the re-engineered application with near-optimal scaling on up to 2k GPUs. The new code enables simulating production-level combustion problems at high-fidelity within tractable time-frames that were previously prohibitively expensive.

Scalable Coupled CFD–DSMC Simulations for Hypersonic Flows: Accuracy–Performance Trade-offs on Modern HPC Systems • Vahid Jafari

Hypersonic flows in transitional and rarefied regimes present significant modelling challenges, as continuum assumptions progressively break down and neither classical CFD nor particle-based approaches alone are sufficient. This work presents a coupled CFD–DSMC framework for hypersonic flow simulations, integrating an in-house finite-volume CFD solver with the SPARTA DSMC code through the Macro-Micro-Coupling (MaMiCo) tool. The approach employs spatial domain decomposition to restrict DSMC calculations to non-equilibrium regions, thereby avoiding the prohibitive cost of full-domain particle simulations. Rarefied gas behaviour in transitional regimes is investigated, with emphasis on consistency and conservation properties across the CFD–DSMC coupling interface. Particular attention is given to the relationship between physical accuracy and computational cost on distributed-memory HPC systems. In DSMC simulations, increasing the number of simulation particles leads to a steep rise in computational cost, limiting achievable particle densities in large-scale problems. To reduce statistical fluctuation errors without excessively increasing particle numbers within a single simulation, multiple independent DSMC instances are executed concurrently. Ensemble averaging of these independent realisations improves statistical convergence while maintaining feasible per-instance computational requirements. Error estimation capabilities provided by the Macro-Micro-Coupling (MaMiCo) tool are used to quantify statistical and coupling uncertainties, enabling a systematic assessment of the trade-off between accuracy and computational effort. Performance characteristics—including runtime, parallel scalability, memory footprint, and energy consumption—are analysed for different coupling configurations and DSMC ensemble sizes. The results provide practical guidelines for selecting particle numbers, DSMC subdomain sizes, and ensemble strategies, enabling efficient and scalable high-fidelity simulations of hypersonic transitional flows on modern HPC architectures.

Finite Element Method Scaling and Performance on MI300X GPUs Through MFEM-Enabled MOOSE • Henrique Bergallo Rocha

With the new generation of GPU-accelerated HPC systems opening new avenues in large-scale physical modelling, nuclear fusion applications through the Finite Element Method (FEM) require code that is performant at scale, parallelisable, and that can faithfully reproduce highly coupled, complex multiphysics on meshes with up to O(10^10) degrees of freedom. MFEM-Enabled MOOSE, i.e. a build of the MOOSE framework that utilises the MFEM FEM library as opposed to libMesh, has been shown to fulfil these requirements on CPUs and on GPUs with a CUDA backend. However, so far the scalability and general performance of MFEM-Enabled MOOSE on recent AMD hardware has not been explored to great extent, which may turn to be a blind spot when it comes to the deployment of large HPC systems in the UK with AMD GPUs. In this talk, we present our results from benchmarking, profiling, and measuring the scalability of MFEM-Enabled MOOSE on the MI300X nodes in the CSD3 system. We perform weak and strong scaling analyses across a variety of sample FEM problems utilising different solvers and preconditioners. The sample problems studied represent different kinds of physics and span the entire De Rham complex, and they are tested to up to O(10^2) GPU cards on a single problem. We then compare these results to those tested on equivalent NVIDIA hardware and discuss their relative strengths and weaknesses.

Title • Robert Bird

Description