Skip to content

Split-explicit acoustic substepping for low-Mach time integration (with two-tier shock/interface capture)#1619

Draft
sbryngelson wants to merge 39 commits into
MFlowCode:masterfrom
sbryngelson:lomach-upstream
Draft

Split-explicit acoustic substepping for low-Mach time integration (with two-tier shock/interface capture)#1619
sbryngelson wants to merge 39 commits into
MFlowCode:masterfrom
sbryngelson:lomach-upstream

Conversation

@sbryngelson

@sbryngelson sbryngelson commented Jun 26, 2026

Copy link
Copy Markdown
Member

Summary

Adds a split-explicit acoustic-substepping time integrator (acoustic_substepping) that removes the acoustic-CFL bottleneck at low Mach number without any global solve, and — via a two-tier robust flux — captures material interfaces and shocks using MFC's existing WENO+HLLC machinery (no shock sensor, no artificial viscosity).

Explicit compressible solvers are limited by dt ~ dx/(|u|+c). When c >> |u| (low Mach), that step is roughly 1/M smaller than the advective scale, so almost every step is spent resolving fast acoustics that carry little of the actual dynamics. Pressure-projection / implicit-acoustic methods remove the restriction but require a global elliptic pressure solve — global reductions and sparse solves that scale poorly on GPU clusters. This mode keeps the solver fully explicit and nearest-neighbor: the slow (advective) terms advance at the advective CFL, and the fast (acoustic) terms are subcycled with a cheap forward–backward scheme.

This is the Klemp–Wicker–Skamarock split-explicit approach used in atmospheric dynamical cores, adapted to MFC's Godunov finite-volume framework (cf. Nazari & Nair, JAMES 2017).

How it works

Per SSP-RK3 stage, the governing equations are split into a slow part advanced once at the large step dt ~ dx/|u|, and a fast part subcycled on ns ≈ (|u|+c)/|u| micro-steps of size dtau = dt/ns:

  • Slow (one expensive evaluation per stage, frozen during the subcycle): momentum advection, volume-fraction advection, viscous/source terms.
  • Fast (cheap, subcycled): mass transport, energy transport, and the pressure gradient in the momentum equation.

The expensive WENO+Riemann work therefore runs at the advective rate, while each acoustic micro-step is a low-order stencil — that asymmetry is where the speedup comes from.

Slow flux. Rather than a new flux scheme, the slow flux reuses HLLC with two changes (gated on acoustic_substepping): the pressure term is removed from the momentum/energy flux (it moves to the substep), and the wave-speed dissipation is capped to the contact speed s_S, so the convective flux is |u|-stable at the advective CFL instead of carrying the |u|+c dissipation that would be unstable there. MFC's existing low_Mach correction handles low-Mach accuracy.

Acoustic substep (m_acoustic_substep.fpp). A 2nd-order centered forward–backward update of mass/energy transport and the pressure gradient, with the slow forcing added as a frozen dtau-scaled source each micro-step. Stabilized by grad–div divergence damping whose discrete operator is rank-one (-v vᵀ symbol), so it damps only compressive/acoustic modes and leaves divergence-free vortical flow untouched. The forward sweep computes flux divergences from a frozen snapshot of the conserved state and applies them afterward, which keeps species mass conservative (an in-place stencil update is not).

Two-tier flux: capturing interfaces and shocks

The centered acoustic substep alone is non-upwinded and rings at discontinuities. To capture material interfaces and shocks without a shock sensor or artificial viscosity, the acoustic flux is evaluated per face by one of two tiers:

  • Smooth tier (unflagged faces): the cheap centered update above, with grad–div damping as its stabilizer.
  • Robust tier (flagged faces): the live full-HLLC flux (convective + acoustic, mass + energy + momentum), re-evaluated each micro-step on the WENO-reconstructed state. Because the micro-step size dtau ≈ dx/(|u|+c) is exactly the step at which standard explicit HLLC is stable, a flagged face reduces to the standard Godunov scheme locally — so it matches the full-HLLC reference by construction.

A face is flagged by intrinsic signals only: a scale-invariant WENO smoothness-indicator test (a per-cell pressure jump exceeding ~1% of the local pressure), or a volume-fraction jump (material interfaces), or an open/outflow boundary. All conserved quantities are applied as single-valued telescoping per-face deltas, so mass, energy, and momentum stay exactly conservative; unflagged faces are bit-identical to the centered tier. The acoustic and convective parts of HLLC are exposed as small device-callable entries (s_acoustic_face_flux, s_convective_face_flux) so the substep reuses the solver without duplicating it.

A subtlety worth noting: making the convective momentum live at flagged faces requires subtracting the exact per-face slow momentum flux the slow path computed at the RK stage state (stored from s_compute_rhs), not a reconstruction — because the WS-RK3 stages evaluate the slow forcing at the stage state but restart the subcycle from Uⁿ. Storing the slow flux makes the live/frozen cancellation exact on every stage, which is what makes shocks stable.

Time step. s_compute_dt computes dt from the advective CFL and n_substeps from a domain-max reduction of (|u|+c)/|u|; the only global collective is that existing timestep reduction.

Multifluid. The substep uses the stiffened-gas mixture EOS and mixture velocity, and advects volume fractions on the slow step. For num_fluids == 1 the path reduces exactly to the single-fluid expressions.

GPU. All device regions use the backend-agnostic GPU_* Fypp macros, so the same source runs on CPU, OpenACC, and OpenMP target offload.

What's implemented

  • acoustic_substepping, n_acoustic_substeps, acoustic_div_damp parameters (+ checker/validator constraints: model_eqns=2, time_stepper=3, CFL dt, recon_type=weno, wave_speeds=direct).
  • m_acoustic_substep.fpp: forward–backward subcycle kernel (EOS, grad–div damping, snapshot transport, mixture support) plus the two-tier robust flux — discontinuity criterion, conditional full-state WENO reconstruction at flagged faces (skipped entirely when nothing is flagged, so smooth flow keeps its speed), and the live full-HLLC mass/energy/momentum deltas.
  • m_riemann_solver_hllc.fpp: the slow-flux variant (5-equation block) and the acoustic/convective HLLC face-flux entries.
  • m_rhs.fpp: stores the per-face slow momentum flux for the live/frozen momentum cancellation.
  • m_time_steppers.fpp: s_split_explicit_rk (Wicker–Skamarock RK3) orchestration and the two-CFL s_compute_dt.
  • Parameterized low-Mach acoustic-pulse hardcoded ICs (hcid 183/264/304); example cases and regression golden tests (smooth pulse + a flagged-face shock case).

Validation

Smooth low-Mach speedup (total time to the same physical time), measured via the internal per-step cost, at M = 0.01 (~4.5–7× at M = 0.1):

1-D 2-D 3-D
CPU ~8× ~10× ~11×
GPU (V100) ~5× ~15× ~21×

Best in 2-D/3-D on GPU; weakest in 1-D on GPU (kernel launch overhead). These are lower than the original smooth-only prototype because the two-tier shock-capture machinery now runs on the smooth path; the largest part of that overhead (the per-microstep robust-tier delta apply) is skipped when no face is flagged.

  • Mass conserved to ~1e-15; per-species mass at sharp material interfaces to ~1e-14.
  • 2-D Gresho vortex (M = 0.001) preserved to machine precision vs. ~18% speed decay in the standard scheme.
  • Speedup flat across 1–8 MPI ranks (per-substep narrow halo exchanges do not erode it).
  • num_fluids == 1 smooth flow is bit-identical to the pre-feature result on the regression golden.

Shock / interface capture (robust tier), split vs. the standard full-HLLC solver at the same resolution:

  • 1-D / 2-D / 3-D periodic shock tubes, a 2-D transverse-shear shock, and a 2-D multifluid shock–droplet: all stable and non-oscillatory, matching full-HLLC to ~3–7% L₂ on the conserved fields (the residual is the first-order-in-time micro-step dissipation), per-species mass conserved to ~1e-16. Regression goldens cover smooth, 1-D shock, and 3-D shock.
  • 2-D transverse and 3-D cases confirm all momentum components are transported correctly (homogeneous-direction velocity stays uniform to ~1e-15 — no rotated-flux mapping bug).
  • Sharp (1-cell) material interfaces are stable and conservative.
  • Cost in shock regimes is roughly break-even (~0.9–1.1×): a flagged face turns the robust tier on, and at the moderate Mach of these cases the ~2× step-count win is offset by the ~2× per-step robust-tier cost. The method is a smooth-low-Mach accelerator with localized shock/interface capability — for shock-dominated flow the standard solver (acoustic_substepping = F) is preferable.

Scope and limitations

  • model_eqns = 2 only.
  • Time accuracy. The macro Wicker–Skamarock RK3 coupling is 2nd-order at fixed acoustic CFL; the forward–backward acoustic micro-step is symplectic-Euler (1st-order in dtau), so with auto substeps (dtau ∝ dt) production convergence is first-order and a captured shock front is a few cells more dissipative than full-HLLC. Symmetrizing the micro-step (Störmer–Verlet) is future work.
  • Multifluid shocks are more dissipative than full-HLLC; the face EOS now reconstructs the mixture volume fractions (rather than using cell values), but the dominant dissipation at the tested multifluid shocks is the first-order micro-step, not the EOS.
  • Boundaries. The centered smooth tier is unstable at extrapolation/outflow boundaries with a background flow; such open-boundary faces are routed to the robust tier (keyed on the per-face boundary-condition type, so patch / non-uniform open edges are covered). Keep discontinuities away from outflow boundaries, or use stable BCs near them.
  • GPU validated on a single V100 (ACC and OMP-offload); true multi-node / multi-GPU communication cost is not yet measured (intra-node MPI is flat).
  • Not yet done: micro-step second-order-in-time (Störmer–Verlet — the lever for tighter shock accuracy); tiled/local flagging to recover shock-regime speed; OpenMP-offload on Cray/AMD-flang; source-term compatibility (bubbles / elasticity / IBM / chemistry).

Draft — opening for visibility and review of the approach.

…zen snapshot

The forward sweep advanced partial densities and total energy in place while the
centered flux stencil read those same neighbours, so the flux divergence mixed
pre- and post-update values, did not telescope, and lost per-species mass at sharp
interfaces (~0.9% drift). Snapshot the transported conserved vars (q_snap) at the
start of each microstep and compute every flux divergence from that frozen state,
then apply. Restores machine-precision species-mass conservation for any num_fluids.
Single-fluid energy transport shifts from Gauss-Seidel to Jacobi; golden 7F8ED027
regenerated.
@github-actions

Copy link
Copy Markdown

Claude Code Review

Head SHA: cd928ed

Files changed:

  • 23
  • src/simulation/m_acoustic_substep.fpp
  • src/simulation/m_time_steppers.fpp
  • src/simulation/m_sim_helpers.fpp
  • src/simulation/m_riemann_solver_hllc.fpp
  • src/simulation/m_checker.fpp
  • src/simulation/m_start_up.fpp
  • src/simulation/m_data_output.fpp
  • src/simulation/m_global_parameters.fpp
  • toolchain/mfc/test/cases.py
  • toolchain/mfc/case_validator.py

Findings:

toolchain/mfc/test/cases.py — undefined variable xc in analytic IC expression

The new acoustic-substepping regression test sets:

"patch_icpp(1)%pres": "1.0 + 1e-3 * exp(-((x - xc) / 0.05)**2)",

xc is not a recognized MFC IC expression variable. MFC's analytic-IC system validates expressions at case load and rejects unknown identifiers as immediate, named errors (per CLAUDE.md: "syntax errors and unknown variables are immediate, named errors"). The analogous example cases in this PR avoid this correctly — 1D_acoustic_lowmach/case.py uses a Python f-string to substitute the centroid numerically, and 2D_twofluid_lowmach/case.py routes through hcid 264. The test case should use one of those approaches:

"patch_icpp(1)%pres": f"1.0 + 1e-3 * exp(-((x - {0.5}) / 0.05)**2)",

or switch to hcid=183 (the new hardcoded IC added in this PR for exactly this 1D Gaussian-pulse pattern). As written, the test fails at case validation before any solver code runs.

@sbryngelson sbryngelson changed the title Split-explicit acoustic substepping for low-Mach time integration Split-explicit acoustic substepping for low-Mach time integration (with two-tier shock/interface capture) Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant