Split-explicit acoustic substepping for low-Mach time integration (with two-tier shock/interface capture) by sbryngelson · Pull Request #1619 · MFlowCode/MFC

sbryngelson · 2026-06-26T00:19:31Z

Summary

Adds a split-explicit acoustic-substepping time integrator (acoustic_substepping) that removes the acoustic-CFL bottleneck at low Mach number without any global solve, and — via a two-tier robust flux — captures material interfaces and shocks using MFC's existing WENO+HLLC machinery (no shock sensor, no artificial viscosity).

Explicit compressible solvers are limited by dt ~ dx/(|u|+c). When c >> |u| (low Mach), that step is roughly 1/M smaller than the advective scale, so almost every step is spent resolving fast acoustics that carry little of the actual dynamics. Pressure-projection / implicit-acoustic methods remove the restriction but require a global elliptic pressure solve — global reductions and sparse solves that scale poorly on GPU clusters. This mode keeps the solver fully explicit and nearest-neighbor: the slow (advective) terms advance at the advective CFL, and the fast (acoustic) terms are subcycled with a cheap forward–backward scheme.

This is the Klemp–Wicker–Skamarock split-explicit approach used in atmospheric dynamical cores, adapted to MFC's Godunov finite-volume framework (cf. Nazari & Nair, JAMES 2017).

How it works

Per SSP-RK3 stage, the governing equations are split into a slow part advanced once at the large step dt ~ dx/|u|, and a fast part subcycled on ns ≈ (|u|+c)/|u| micro-steps of size dtau = dt/ns:

Slow (one expensive evaluation per stage, frozen during the subcycle): momentum advection, volume-fraction advection, viscous/source terms.
Fast (cheap, subcycled): mass transport, energy transport, and the pressure gradient in the momentum equation.

The expensive WENO+Riemann work therefore runs at the advective rate, while each acoustic micro-step is a low-order stencil — that asymmetry is where the speedup comes from.

Slow flux. Rather than a new flux scheme, the slow flux reuses HLLC with two changes (gated on acoustic_substepping): the pressure term is removed from the momentum/energy flux (it moves to the substep), and the wave-speed dissipation is capped to the contact speed s_S, so the convective flux is |u|-stable at the advective CFL instead of carrying the |u|+c dissipation that would be unstable there. MFC's existing low_Mach correction handles low-Mach accuracy.

Acoustic substep (m_acoustic_substep.fpp). A 2nd-order centered forward–backward update of mass/energy transport and the pressure gradient, with the slow forcing added as a frozen dtau-scaled source each micro-step. Stabilized by grad–div divergence damping whose discrete operator is rank-one (-v vᵀ symbol), so it damps only compressive/acoustic modes and leaves divergence-free vortical flow untouched. The forward sweep computes flux divergences from a frozen snapshot of the conserved state and applies them afterward, which keeps species mass conservative (an in-place stencil update is not).

Two-tier flux: capturing interfaces and shocks

The centered acoustic substep alone is non-upwinded and rings at discontinuities. To capture material interfaces and shocks without a shock sensor or artificial viscosity, the acoustic flux is evaluated per face by one of two tiers:

Smooth tier (unflagged faces): the cheap centered update above, with grad–div damping as its stabilizer.
Robust tier (flagged faces): the live full-HLLC flux (convective + acoustic, mass + energy + momentum), re-evaluated each micro-step on the WENO-reconstructed state. Because the micro-step size dtau ≈ dx/(|u|+c) is exactly the step at which standard explicit HLLC is stable, a flagged face reduces to the standard Godunov scheme locally — so it matches the full-HLLC reference by construction.

A face is flagged by intrinsic signals only: a scale-invariant WENO smoothness-indicator test (a per-cell pressure jump exceeding ~1% of the local pressure), or a volume-fraction jump (material interfaces), or an open/outflow boundary. All conserved quantities are applied as single-valued telescoping per-face deltas, so mass, energy, and momentum stay exactly conservative; unflagged faces are bit-identical to the centered tier. The acoustic and convective parts of HLLC are exposed as small device-callable entries (s_acoustic_face_flux, s_convective_face_flux) so the substep reuses the solver without duplicating it.

A subtlety worth noting: making the convective momentum live at flagged faces requires subtracting the exact per-face slow momentum flux the slow path computed at the RK stage state (stored from s_compute_rhs), not a reconstruction — because the WS-RK3 stages evaluate the slow forcing at the stage state but restart the subcycle from Uⁿ. Storing the slow flux makes the live/frozen cancellation exact on every stage, which is what makes shocks stable.

Time step. s_compute_dt computes dt from the advective CFL and n_substeps from a domain-max reduction of (|u|+c)/|u|; the only global collective is that existing timestep reduction.

Multifluid. The substep uses the stiffened-gas mixture EOS and mixture velocity, and advects volume fractions on the slow step. For num_fluids == 1 the path reduces exactly to the single-fluid expressions.

GPU. All device regions use the backend-agnostic GPU_* Fypp macros, so the same source runs on CPU, OpenACC, and OpenMP target offload.

What's implemented

acoustic_substepping, n_acoustic_substeps, acoustic_div_damp parameters (+ checker/validator constraints: model_eqns=2, time_stepper=3, CFL dt, recon_type=weno, wave_speeds=direct).
m_acoustic_substep.fpp: forward–backward subcycle kernel (EOS, grad–div damping, snapshot transport, mixture support) plus the two-tier robust flux — discontinuity criterion, conditional full-state WENO reconstruction at flagged faces (skipped entirely when nothing is flagged, so smooth flow keeps its speed), and the live full-HLLC mass/energy/momentum deltas.
m_riemann_solver_hllc.fpp: the slow-flux variant (5-equation block) and the acoustic/convective HLLC face-flux entries.
m_rhs.fpp: stores the per-face slow momentum flux for the live/frozen momentum cancellation.
m_time_steppers.fpp: s_split_explicit_rk (Wicker–Skamarock RK3) orchestration and the two-CFL s_compute_dt.
Parameterized low-Mach acoustic-pulse hardcoded ICs (hcid 183/264/304); example cases and regression golden tests (smooth pulse + a flagged-face shock case).

Validation

Smooth low-Mach speedup (total time to the same physical time), measured via the internal per-step cost, at M = 0.01 (~4.5–7× at M = 0.1):

	1-D	2-D	3-D
CPU	~8×	~10×	~11×
GPU (V100)	~5×	~15×	~21×

Best in 2-D/3-D on GPU; weakest in 1-D on GPU (kernel launch overhead). These are lower than the original smooth-only prototype because the two-tier shock-capture machinery now runs on the smooth path; the largest part of that overhead (the per-microstep robust-tier delta apply) is skipped when no face is flagged.

Mass conserved to ~1e-15; per-species mass at sharp material interfaces to ~1e-14.
2-D Gresho vortex (M = 0.001) preserved to machine precision vs. ~18% speed decay in the standard scheme.
Speedup flat across 1–8 MPI ranks (per-substep narrow halo exchanges do not erode it).
num_fluids == 1 smooth flow is bit-identical to the pre-feature result on the regression golden.

Shock / interface capture (robust tier), split vs. the standard full-HLLC solver at the same resolution:

1-D / 2-D / 3-D periodic shock tubes, a 2-D transverse-shear shock, and a 2-D multifluid shock–droplet: all stable and non-oscillatory, matching full-HLLC to ~3–7% L₂ on the conserved fields (the residual is the first-order-in-time micro-step dissipation), per-species mass conserved to ~1e-16. Regression goldens cover smooth, 1-D shock, and 3-D shock.
2-D transverse and 3-D cases confirm all momentum components are transported correctly (homogeneous-direction velocity stays uniform to ~1e-15 — no rotated-flux mapping bug).
Sharp (1-cell) material interfaces are stable and conservative.
Cost in shock regimes is roughly break-even (~0.9–1.1×): a flagged face turns the robust tier on, and at the moderate Mach of these cases the ~2× step-count win is offset by the ~2× per-step robust-tier cost. The method is a smooth-low-Mach accelerator with localized shock/interface capability — for shock-dominated flow the standard solver (acoustic_substepping = F) is preferable.

Scope and limitations

model_eqns = 2 only.
Time accuracy. The macro Wicker–Skamarock RK3 coupling is 2nd-order at fixed acoustic CFL; the forward–backward acoustic micro-step is symplectic-Euler (1st-order in dtau), so with auto substeps (dtau ∝ dt) production convergence is first-order and a captured shock front is a few cells more dissipative than full-HLLC. Symmetrizing the micro-step (Störmer–Verlet) is future work.
Multifluid shocks are more dissipative than full-HLLC; the face EOS now reconstructs the mixture volume fractions (rather than using cell values), but the dominant dissipation at the tested multifluid shocks is the first-order micro-step, not the EOS.
Boundaries. The centered smooth tier is unstable at extrapolation/outflow boundaries with a background flow; such open-boundary faces are routed to the robust tier (keyed on the per-face boundary-condition type, so patch / non-uniform open edges are covered). Keep discontinuities away from outflow boundaries, or use stable BCs near them.
GPU validated on a single V100 (ACC and OMP-offload); true multi-node / multi-GPU communication cost is not yet measured (intra-node MPI is flat).
Not yet done: micro-step second-order-in-time (Störmer–Verlet — the lever for tighter shock accuracy); tiled/local flagging to recover shock-regime speed; OpenMP-offload on Cray/AMD-flang; source-term compatibility (bubbles / elasticity / IBM / chemistry).

Draft — opening for visibility and review of the approach.

…t-explicit mode

…ipation) for split mode

… damping)

…ort, CFL example cases

…substeps a floor

…less, CFL-stable)

…ng golden test

…eps overflow, fix cfl_const_dt restart

…ty reuse)

… (GPU acc-routine-seq portability)

…s (hcid 183/264/304)

…e EOS, velocity, slow alpha advection)

…zen snapshot The forward sweep advanced partial densities and total energy in place while the centered flux stencil read those same neighbours, so the flux divergence mixed pre- and post-update values, did not telescope, and lost per-species mass at sharp interfaces (~0.9% drift). Snapshot the transported conserved vars (q_snap) at the start of each microstep and compute every flux divergence from that frozen state, then apply. Restores machine-precision species-mass conservation for any num_fluids. Single-fluid energy transport shifts from Gauss-Seidel to Jacobi; golden 7F8ED027 regenerated.

… acoustic_substepping)

github-actions · 2026-06-26T00:35:39Z

Claude Code Review

Head SHA: cd928ed

Files changed:

23
src/simulation/m_acoustic_substep.fpp
src/simulation/m_time_steppers.fpp
src/simulation/m_sim_helpers.fpp
src/simulation/m_riemann_solver_hllc.fpp
src/simulation/m_checker.fpp
src/simulation/m_start_up.fpp
src/simulation/m_data_output.fpp
src/simulation/m_global_parameters.fpp
toolchain/mfc/test/cases.py
toolchain/mfc/case_validator.py

Findings:

toolchain/mfc/test/cases.py — undefined variable xc in analytic IC expression

The new acoustic-substepping regression test sets:

"patch_icpp(1)%pres": "1.0 + 1e-3 * exp(-((x - xc) / 0.05)**2)",

xc is not a recognized MFC IC expression variable. MFC's analytic-IC system validates expressions at case load and rejects unknown identifiers as immediate, named errors (per CLAUDE.md: "syntax errors and unknown variables are immediate, named errors"). The analogous example cases in this PR avoid this correctly — 1D_acoustic_lowmach/case.py uses a Python f-string to substitute the centroid numerically, and 2D_twofluid_lowmach/case.py routes through hcid 264. The test case should use one of those approaches:

"patch_icpp(1)%pres": f"1.0 + 1e-3 * exp(-((x - {0.5}) / 0.05)**2)",

or switch to hcid=183 (the new hardcoded IC added in this PR for exactly this 1D Gaussian-pulse pattern). As written, the test fails at case validation before any solver code runs.

…-fraction variation)

…WENO-Z criterion)

…substep

…ture)

…heme

…flux (shock stability)

…s for shock capture

…substep

… reconstruction, add shock golden

…ck EOS accuracy)

…overage)

…low speedup)

sbryngelson added 18 commits June 25, 2026 20:14

feat(sim): add acoustic_substepping params and checker exclusions

b577ecd

feat(sim): advective-CFL timestep and acoustic substep count for spli…

ff22682

…t-explicit mode

feat(sim): HLLC slow-flux variant (p-star removed, contact-speed diss…

166bdd4

…ipation) for split mode

feat(sim): acoustic substep module (forward-backward, EOS, divergence…

53bba3a

… damping)

feat(sim): split-explicit RK orchestration for acoustic substepping

e5b07f6

fix(sim): guard acoustic_substepping (CFL+RK3 required), gate ICFL ab…

2d735ef

…ort, CFL example cases

test(sim): 2D Gresho low-Mach validation case for acoustic substepping

cfc0257

fix(sim): correct multi-D acoustic substep count and make n_acoustic_…

63b9601

…substeps a floor

fix(sim): grad-div divergence damping for acoustic substep (dimension…

d464c49

…less, CFL-stable)

feat(test): 2D isentropic vortex lowmach example + acoustic_substeppi…

e85b58d

…ng golden test

fix(sim): guard num_fluids==1 for acoustic_substepping, clamp n_subst…

cdfef09

…eps overflow, fix cfl_const_dt restart

perf(sim): slim acoustic substep kernel (direct EOS + per-cell veloci…

bd098fc

…ty reuse)

fix(sim): pass acoustic_substepping by value to s_compute_dt_from_cfl…

1ea7ae1

… (GPU acc-routine-seq portability)

feat(pre_process): parameterized low-Mach acoustic-pulse hardcoded IC…

2fd0952

…s (hcid 183/264/304)

feat(sim): generalize acoustic substep kernel to num_fluids>1 (mixtur…

b8edeee

…e EOS, velocity, slow alpha advection)

fix(sim): resolve nvhpc OMP-offload ICE in s_compute_dt (firstprivate…

f004f19

… acoustic_substepping)

docs: document split-explicit acoustic substepping (low-Mach) integrator

cd928ed

sbryngelson added 11 commits June 25, 2026 21:35

feat(sim): expose HLLC acoustic-flux entry for the substep robust tier

26aadc1

feat(sim): WENO reconstruction of acoustic variables in the substep

16de323

feat(sim): two-tier discontinuity criterion (WENO indicators + volume…

5201896

…-fraction variation)

feat(sim): two-tier acoustic flux (cheap central / WENO+HLLC robust, …

1500cdd

…WENO-Z criterion)

style(sim): wrap long acoustic_substepping recon_type PROHIBIT line

a51348d

test(sim): interface + shock-interaction validation for the two-tier …

c2228eb

…substep

feat(sim): expose HLLC convective-flux entry for the substep robust tier

2c425c0

feat(sim): conditional full-state reconstruction at flagged faces

c2093c3

feat(sim): full-HLLC convective transport at flagged faces (shock cap…

bd629c7

…ture)

feat(sim): verify second-order time accuracy of the acoustic split sc…

3534f73

…heme

test(sim): shock-interface validation + docs for shock-capable substep

fcfc026

sbryngelson added 5 commits June 26, 2026 09:37

feat(sim): live convective momentum at flagged faces via stored slow …

700ecaf

…flux (shock stability)

test(sim): 2D transverse + multifluid shock-interface validation; doc…

7c6c5b9

…s for shock capture

feat(sim): robust upwind flux at outflow boundaries for the acoustic …

4ed1f04

…substep

fix(sim): final-review wave — correct frozen-flux comments, drop dead…

911fe26

… reconstruction, add shock golden

style(sim): apply ./mfc.sh format to acoustic substep + rhs

99a9784

sbryngelson changed the title ~~Split-explicit acoustic substepping for low-Mach time integration~~ Split-explicit acoustic substepping for low-Mach time integration (with two-tier shock/interface capture) Jun 26, 2026

sbryngelson added 5 commits June 26, 2026 12:06

refactor(sim): tidy acoustic substep kernel (dead code, comments, dups)

a0b96a5

feat(sim): reconstruct mixture alpha at flagged faces (multifluid sho…

33b01d8

…ck EOS accuracy)

feat(sim): per-face bc_type for outflow-boundary flagging (patch-BC c…

098ed7e

…overage)

test(sim): 3D shock-capture validation + 3D flagged-face golden

488f0b5

perf(sim): skip robust-tier delta apply on unflagged stages (smooth-f…

d87f919

…low speedup)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Split-explicit acoustic substepping for low-Mach time integration (with two-tier shock/interface capture)#1619

Split-explicit acoustic substepping for low-Mach time integration (with two-tier shock/interface capture)#1619
sbryngelson wants to merge 39 commits into
MFlowCode:masterfrom
sbryngelson:lomach-upstream

sbryngelson commented Jun 26, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sbryngelson commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Two-tier flux: capturing interfaces and shocks

What's implemented

Validation

Scope and limitations

Uh oh!

github-actions Bot commented Jun 26, 2026

Claude Code Review

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

sbryngelson commented Jun 26, 2026 •

edited

Loading