Split-explicit acoustic substepping for low-Mach time integration (with two-tier shock/interface capture)#1619
Split-explicit acoustic substepping for low-Mach time integration (with two-tier shock/interface capture)#1619sbryngelson wants to merge 39 commits into
Conversation
…ipation) for split mode
…ort, CFL example cases
…less, CFL-stable)
…eps overflow, fix cfl_const_dt restart
… (GPU acc-routine-seq portability)
…s (hcid 183/264/304)
…e EOS, velocity, slow alpha advection)
…zen snapshot The forward sweep advanced partial densities and total energy in place while the centered flux stencil read those same neighbours, so the flux divergence mixed pre- and post-update values, did not telescope, and lost per-species mass at sharp interfaces (~0.9% drift). Snapshot the transported conserved vars (q_snap) at the start of each microstep and compute every flux divergence from that frozen state, then apply. Restores machine-precision species-mass conservation for any num_fluids. Single-fluid energy transport shifts from Gauss-Seidel to Jacobi; golden 7F8ED027 regenerated.
… acoustic_substepping)
Claude Code ReviewHead SHA: cd928ed Files changed:
Findings:
The new acoustic-substepping regression test sets: "patch_icpp(1)%pres": "1.0 + 1e-3 * exp(-((x - xc) / 0.05)**2)",
"patch_icpp(1)%pres": f"1.0 + 1e-3 * exp(-((x - {0.5}) / 0.05)**2)",or switch to |
…-fraction variation)
…WENO-Z criterion)
…flux (shock stability)
…s for shock capture
… reconstruction, add shock golden
Summary
Adds a split-explicit acoustic-substepping time integrator (
acoustic_substepping) that removes the acoustic-CFL bottleneck at low Mach number without any global solve, and — via a two-tier robust flux — captures material interfaces and shocks using MFC's existing WENO+HLLC machinery (no shock sensor, no artificial viscosity).Explicit compressible solvers are limited by
dt ~ dx/(|u|+c). Whenc >> |u|(low Mach), that step is roughly1/Msmaller than the advective scale, so almost every step is spent resolving fast acoustics that carry little of the actual dynamics. Pressure-projection / implicit-acoustic methods remove the restriction but require a global elliptic pressure solve — global reductions and sparse solves that scale poorly on GPU clusters. This mode keeps the solver fully explicit and nearest-neighbor: the slow (advective) terms advance at the advective CFL, and the fast (acoustic) terms are subcycled with a cheap forward–backward scheme.This is the Klemp–Wicker–Skamarock split-explicit approach used in atmospheric dynamical cores, adapted to MFC's Godunov finite-volume framework (cf. Nazari & Nair, JAMES 2017).
How it works
Per SSP-RK3 stage, the governing equations are split into a slow part advanced once at the large step
dt ~ dx/|u|, and a fast part subcycled onns ≈ (|u|+c)/|u|micro-steps of sizedtau = dt/ns:The expensive WENO+Riemann work therefore runs at the advective rate, while each acoustic micro-step is a low-order stencil — that asymmetry is where the speedup comes from.
Slow flux. Rather than a new flux scheme, the slow flux reuses HLLC with two changes (gated on
acoustic_substepping): the pressure term is removed from the momentum/energy flux (it moves to the substep), and the wave-speed dissipation is capped to the contact speeds_S, so the convective flux is|u|-stable at the advective CFL instead of carrying the|u|+cdissipation that would be unstable there. MFC's existinglow_Machcorrection handles low-Mach accuracy.Acoustic substep (
m_acoustic_substep.fpp). A 2nd-order centered forward–backward update of mass/energy transport and the pressure gradient, with the slow forcing added as a frozendtau-scaled source each micro-step. Stabilized by grad–div divergence damping whose discrete operator is rank-one (-v vᵀsymbol), so it damps only compressive/acoustic modes and leaves divergence-free vortical flow untouched. The forward sweep computes flux divergences from a frozen snapshot of the conserved state and applies them afterward, which keeps species mass conservative (an in-place stencil update is not).Two-tier flux: capturing interfaces and shocks
The centered acoustic substep alone is non-upwinded and rings at discontinuities. To capture material interfaces and shocks without a shock sensor or artificial viscosity, the acoustic flux is evaluated per face by one of two tiers:
dtau ≈ dx/(|u|+c)is exactly the step at which standard explicit HLLC is stable, a flagged face reduces to the standard Godunov scheme locally — so it matches the full-HLLC reference by construction.A face is flagged by intrinsic signals only: a scale-invariant WENO smoothness-indicator test (a per-cell pressure jump exceeding ~1% of the local pressure), or a volume-fraction jump (material interfaces), or an open/outflow boundary. All conserved quantities are applied as single-valued telescoping per-face deltas, so mass, energy, and momentum stay exactly conservative; unflagged faces are bit-identical to the centered tier. The acoustic and convective parts of HLLC are exposed as small device-callable entries (
s_acoustic_face_flux,s_convective_face_flux) so the substep reuses the solver without duplicating it.A subtlety worth noting: making the convective momentum live at flagged faces requires subtracting the exact per-face slow momentum flux the slow path computed at the RK stage state (stored from
s_compute_rhs), not a reconstruction — because the WS-RK3 stages evaluate the slow forcing at the stage state but restart the subcycle fromUⁿ. Storing the slow flux makes the live/frozen cancellation exact on every stage, which is what makes shocks stable.Time step.
s_compute_dtcomputesdtfrom the advective CFL andn_substepsfrom a domain-max reduction of(|u|+c)/|u|; the only global collective is that existing timestep reduction.Multifluid. The substep uses the stiffened-gas mixture EOS and mixture velocity, and advects volume fractions on the slow step. For
num_fluids == 1the path reduces exactly to the single-fluid expressions.GPU. All device regions use the backend-agnostic
GPU_*Fypp macros, so the same source runs on CPU, OpenACC, and OpenMP target offload.What's implemented
acoustic_substepping,n_acoustic_substeps,acoustic_div_dampparameters (+ checker/validator constraints:model_eqns=2,time_stepper=3, CFL dt,recon_type=weno,wave_speeds=direct).m_acoustic_substep.fpp: forward–backward subcycle kernel (EOS, grad–div damping, snapshot transport, mixture support) plus the two-tier robust flux — discontinuity criterion, conditional full-state WENO reconstruction at flagged faces (skipped entirely when nothing is flagged, so smooth flow keeps its speed), and the live full-HLLC mass/energy/momentum deltas.m_riemann_solver_hllc.fpp: the slow-flux variant (5-equation block) and the acoustic/convective HLLC face-flux entries.m_rhs.fpp: stores the per-face slow momentum flux for the live/frozen momentum cancellation.m_time_steppers.fpp:s_split_explicit_rk(Wicker–Skamarock RK3) orchestration and the two-CFLs_compute_dt.hcid183/264/304); example cases and regression golden tests (smooth pulse + a flagged-face shock case).Validation
Smooth low-Mach speedup (total time to the same physical time), measured via the internal per-step cost, at M = 0.01 (~4.5–7× at M = 0.1):
Best in 2-D/3-D on GPU; weakest in 1-D on GPU (kernel launch overhead). These are lower than the original smooth-only prototype because the two-tier shock-capture machinery now runs on the smooth path; the largest part of that overhead (the per-microstep robust-tier delta apply) is skipped when no face is flagged.
num_fluids == 1smooth flow is bit-identical to the pre-feature result on the regression golden.Shock / interface capture (robust tier), split vs. the standard full-HLLC solver at the same resolution:
acoustic_substepping = F) is preferable.Scope and limitations
model_eqns = 2only.dtau), so with auto substeps (dtau ∝ dt) production convergence is first-order and a captured shock front is a few cells more dissipative than full-HLLC. Symmetrizing the micro-step (Störmer–Verlet) is future work.Draft — opening for visibility and review of the approach.