Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ jobs:
- name: Test under emulation
run: >
ctest --test-dir build --output-on-failure
-E 'AsrcQuality|AsrcLock|TwoThreadStress|TransparentPrototypeMeetsSpec|MultiChannel\.'
-E 'AsrcQuality|AsrcLock|TwoThreadStress|TransparentPrototypeMeetsSpec|MultiChannel\.|Feasibility|Reset\.'

# Cross-compile for Arm Cortex-M55 (bare metal, newlib + semihosting) and
# run the emulation-sized test subset on QEMU's MPS3 AN547 board model.
Expand Down
14 changes: 12 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,18 @@ latency = targetLatencyFrames + (L·T − 1)/(2L) [input frames]
`designedLatencySeconds()` reports the figure; the FIFO term breathes by a
fraction of the block size as the servo tracks drift. The filter is linear
phase. For lower latency use `FilterSpec::fast()` (~16-frame group delay)
and a smaller `targetLatencyFrames`; the FIFO setpoint must stay above the
peak occupancy excursion of your push/pull block jitter.
and a smaller `targetLatencyFrames`.

**The setpoint must exceed the pull block size** — a pull synthesizes from
frames already buffered, so a setpoint at or below the callback size is
infeasible and would drain into a permanent dropout cycle. The converter
enforces this automatically: when it observes pull blocks larger than the
configured setpoint it raises the effective setpoint (block + ~half-block
margin, bounded by FIFO capacity) and reports the value in
`Status::effectiveTargetLatencyFrames`; latency follows the raised
setpoint. Callbacks above ~340 frames also need `fifoFrames` sized
explicitly. The setpoint must additionally stay above the peak occupancy
excursion of your push/pull jitter, as before.

## Measured performance

Expand Down
124 changes: 112 additions & 12 deletions include/srt/asrc.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,12 @@
#ifndef SRT_ASRC_HPP
#define SRT_ASRC_HPP

#include <algorithm>
#include <atomic>
#include <bit>
#include <cmath>
#include <cstdint>
#include <limits>
#include <stdexcept>
#include <type_traits>

Expand All @@ -16,9 +19,9 @@

namespace srt {

/// Converter configuration. The defaults realize the whitepaper's worked
/// budget: ~1 ms core latency (FIFO setpoint 48 frames + ~24 frames filter
/// group delay) at 48 kHz, transparent for clocks within +/-1000 ppm.
/// Converter configuration. The defaults give ~1.5 ms designed latency at
/// 48 kHz (FIFO setpoint 48 frames + ~24 frames filter group delay; see
/// the README latency section), transparent for clocks within +/-1000 ppm.
struct Config {
double sampleRateHz = 48000.0; ///< nominal rate of BOTH clock domains
std::size_t channels = 2;
Expand Down Expand Up @@ -72,6 +75,11 @@ struct Status {
std::uint64_t overruns = 0; ///< push() calls that could not accept every
///< offered frame (FIFO full; excess dropped)
std::uint64_t resyncs = 0; ///< hard occupancy resyncs (high watermark)
/// The setpoint actually in force. Starts at Config::targetLatencyFrames
/// and is raised automatically when pull() blocks larger than the
/// setpoint are observed (see pull()); differs from the configured value
/// exactly when that adaptation has occurred.
std::uint64_t effectiveTargetLatencyFrames = 0;
};

/// Near-unity asynchronous sample rate converter between two clock domains.
Expand All @@ -93,11 +101,22 @@ class BasicAsyncSampleRateConverter {
resampler_(bank_, cfg_.channels, kPopChunkFrames),
ring_(ringCapacityElems(cfg_, bank_.taps())),
servo_(cfg_.servo, cfg_.sampleRateHz, static_cast<double>(cfg_.targetLatencyFrames)),
targetFrames_(cfg_.targetLatencyFrames),
fillThresholdFrames_(cfg_.targetLatencyFrames + bank_.taps()),
highWaterFrames_(std::max(3 * cfg_.targetLatencyFrames,
fillThresholdFrames_ + cfg_.targetLatencyFrames)) {
if (ring_.capacity() / cfg_.channels <= highWaterFrames_)
throw std::invalid_argument("AsyncSampleRateConverter: fifoFrames too small");
// Largest setpoint the FIFO capacity supports while keeping the
// high-watermark relation; bounds the adaptive raise in pull().
const std::size_t capFrames = ring_.capacity() / cfg_.channels;
const std::size_t taps = bank_.taps();
maxTargetFrames_ = std::max(cfg_.targetLatencyFrames,
std::min((capFrames - 1) / 3, capFrames > taps + 1
? (capFrames - taps - 1) / 2
: cfg_.targetLatencyFrames));
effectiveTarget_.store(static_cast<std::uint32_t>(targetFrames_),
std::memory_order_relaxed);
}

BasicAsyncSampleRateConverter(const BasicAsyncSampleRateConverter&) = delete;
Expand All @@ -117,14 +136,43 @@ class BasicAsyncSampleRateConverter {
/// Consumer thread: produce exactly `frames` interleaved output frames at
/// the output clock. Silence-pads while filling and on underrun, and
/// fades the first kFadeFrames frames in after every (re)fill so dropout
/// recovery does not click. Returns the number of frames synthesized
/// from real input.
/// recovery does not click. (The dropout onset itself and a hard-resync
/// splice are unfaded cuts: there is nothing valid to fade to at the
/// moment they occur.) Returns the number of frames synthesized from
/// real input.
std::size_t pull(S* interleaved, std::size_t frames) noexcept {
const std::size_t ch = cfg_.channels;
const auto popFn = [this](S* dst, std::size_t maxFrames) noexcept {
return ring_.read(dst, maxFrames * cfg_.channels) / cfg_.channels;
};

// Feasibility: a pull must synthesize from frames already buffered,
// so the occupancy setpoint must exceed the pull block size or the
// loop drains into a permanent underrun limit cycle (dropouts every
// few hundred ms, never locking). Raise the effective setpoint to
// the largest observed block plus slew/sawtooth margin, bounded by
// FIFO capacity; the servo slews to the new setpoint glitch-free
// (integrator kept, occupancy only grows). Cost: latency follows
// the raised setpoint — see Status::effectiveTargetLatencyFrames.
if (frames > observedMaxPull_) {
observedMaxPull_ = frames;
// Margin sized to the block-beat sawtooth (~half the block) so
// the entry occupancy never grazes the pull size; configs that
// already satisfy it (e.g. the 32-frame default transfer against
// the 48-frame default setpoint) are left exactly as configured.
const std::size_t needed = frames + std::max<std::size_t>(frames / 2, kPopChunkFrames);
const std::size_t newTarget =
std::clamp(needed, cfg_.targetLatencyFrames, maxTargetFrames_);
if (newTarget > targetFrames_) {
targetFrames_ = newTarget;
fillThresholdFrames_ = newTarget + bank_.taps();
highWaterFrames_ = std::max(3 * newTarget, fillThresholdFrames_ + newTarget);
servo_.setTarget(static_cast<double>(newTarget));
effectiveTarget_.store(static_cast<std::uint32_t>(newTarget),
std::memory_order_relaxed);
}
}

double occ = backlogFrames();

if (filling_) {
Expand All @@ -143,8 +191,15 @@ class BasicAsyncSampleRateConverter {
}

if (occ > static_cast<double>(highWaterFrames_)) { // hard resync
const double target = static_cast<double>(cfg_.targetLatencyFrames);
const auto dropFrames = static_cast<std::size_t>(occ - target);
const double target = static_cast<double>(targetFrames_);
// The discard can only come from the ring; frames staged in the
// resampler scratch are part of occ but not discardable. Clamp,
// or a setpoint below the staged count drains the ring entirely
// and cascades straight back into Filling.
const std::size_t ringFrames = ring_.readAvailable() / ch;
const double excess = occ - target;
const std::size_t dropFrames =
std::min(ringFrames, excess > 0.0 ? static_cast<std::size_t>(excess) : 0);
ring_.discard(dropFrames * ch);
resyncs_.fetch_add(1, std::memory_order_relaxed);
occ = backlogFrames();
Expand Down Expand Up @@ -178,6 +233,7 @@ class BasicAsyncSampleRateConverter {
s.underruns = underruns_.load(std::memory_order_relaxed);
s.overruns = overruns_.load(std::memory_order_relaxed);
s.resyncs = resyncs_.load(std::memory_order_relaxed);
s.effectiveTargetLatencyFrames = effectiveTarget_.load(std::memory_order_relaxed);
return s;
}

Expand All @@ -191,10 +247,12 @@ class BasicAsyncSampleRateConverter {
publishStatus();
}

/// Nominal design latency: FIFO setpoint + filter group delay. The actual
/// figure breathes by a fraction of a frame as the servo tracks drift.
/// Nominal design latency: FIFO setpoint + filter group delay. Uses the
/// effective (possibly adaptively raised) setpoint; the actual figure
/// breathes by a fraction of a frame as the servo tracks drift.
double designedLatencySeconds() const noexcept {
return (static_cast<double>(cfg_.targetLatencyFrames) + bank_.groupDelaySamples()) /
return (static_cast<double>(effectiveTarget_.load(std::memory_order_relaxed)) +
bank_.groupDelaySamples()) /
cfg_.sampleRateHz;
}

Expand All @@ -205,8 +263,12 @@ class BasicAsyncSampleRateConverter {

static std::size_t ringCapacityElems(const Config& cfg, std::size_t taps) {
const std::size_t fillThreshold = cfg.targetLatencyFrames + taps;
// The 1024-frame floor (21 ms at 48 kHz) leaves the adaptive
// setpoint raise enough capacity for pull blocks up to ~340 frames
// without explicit fifoFrames sizing; larger callbacks need
// fifoFrames set by the caller (the raise clamps to capacity).
const std::size_t frames =
cfg.fifoFrames != 0 ? cfg.fifoFrames : std::max<std::size_t>(256, 4 * fillThreshold);
cfg.fifoFrames != 0 ? cfg.fifoFrames : std::max<std::size_t>(1024, 4 * fillThreshold);
return std::bit_ceil(frames * cfg.channels);
}

Expand Down Expand Up @@ -254,9 +316,40 @@ class BasicAsyncSampleRateConverter {
fill_.store(static_cast<float>(servo_.smoothedOccupancy()), std::memory_order_relaxed);
}

/// Rejects configurations that would otherwise construct successfully
/// and misbehave silently: NaN/Inf anywhere (a NaN sample rate designs
/// an all-NaN coefficient table), band edges whose sum exceeds the rate
/// (anti-image cutoff above input Nyquist passes images wholesale), a
/// deviation clamp large enough to overflow the Q0.64 eps conversion
/// (UB), and size products that overflow 32-bit size_t targets.
static Config validated(Config cfg) {
if (cfg.channels == 0 || cfg.sampleRateHz <= 0.0 || cfg.targetLatencyFrames == 0)
const auto finite = [](double v) { return std::isfinite(v); };
if (cfg.channels == 0 || cfg.targetLatencyFrames == 0 || !finite(cfg.sampleRateHz) ||
cfg.sampleRateHz <= 0.0)
throw std::invalid_argument("AsyncSampleRateConverter: bad Config");
const FilterSpec& f = cfg.filter;
if (!finite(f.passbandHz) || !finite(f.stopbandHz) || !finite(f.stopbandAttenDb) ||
f.passbandHz + f.stopbandHz > cfg.sampleRateHz)
throw std::invalid_argument("AsyncSampleRateConverter: bad FilterSpec "
"(need passbandHz + stopbandHz <= sampleRateHz)");
const ServoConfig& sv = cfg.servo;
if (!finite(sv.acquireBandwidthHz) || !finite(sv.trackBandwidthHz) ||
!finite(sv.quietBandwidthHz) || !finite(sv.damping) || !finite(sv.acquireSmootherHz) ||
!finite(sv.trackSmootherHz) || !finite(sv.quietSmootherHz) ||
!finite(sv.lockThresholdFrames) || !finite(sv.lockHoldSeconds) ||
!finite(sv.quietHoldSeconds) || !finite(sv.unlockThresholdFrames) ||
!finite(sv.maxDeviationPpm) || sv.maxDeviationPpm <= 0.0 ||
sv.maxDeviationPpm > 100000.0) // |eps| stays far from the Q0.64 int64 limit
throw std::invalid_argument("AsyncSampleRateConverter: bad ServoConfig");
// Size products evaluated later must not wrap on 32-bit size_t.
const auto mulOk = [](std::size_t a, std::size_t b) {
return b == 0 || a <= std::numeric_limits<std::size_t>::max() / b;
};
const std::size_t phases = std::bit_ceil(f.numPhases);
if (!mulOk(phases + 1, f.tapsPerPhase) ||
!mulOk(cfg.targetLatencyFrames + f.tapsPerPhase, 8 * cfg.channels) ||
!mulOk(cfg.fifoFrames, 2 * cfg.channels))
throw std::invalid_argument("AsyncSampleRateConverter: Config sizes overflow");
return cfg;
}

Expand All @@ -267,8 +360,12 @@ class BasicAsyncSampleRateConverter {
FractionalResampler<S> resampler_;
SpscRing<S> ring_;
PiServo servo_;
// Consumer-thread setpoint state (see the adaptive raise in pull()).
std::size_t targetFrames_;
std::size_t fillThresholdFrames_;
std::size_t highWaterFrames_;
std::size_t maxTargetFrames_ = 0;
std::size_t observedMaxPull_ = 0;
bool filling_ = true; // consumer-thread state; mirrored into state_
std::size_t fadeFramesLeft_ = 0; // consumer-thread state

Expand All @@ -279,6 +376,9 @@ class BasicAsyncSampleRateConverter {
std::atomic<int> state_{static_cast<int>(State::Filling)};
std::atomic<float> ppm_{0.0f};
std::atomic<float> fill_{0.0f};
// Effective setpoint mirror for status()/designedLatencySeconds() from
// any thread; written only by the consumer (32-bit: lock-free everywhere).
std::atomic<std::uint32_t> effectiveTarget_{0};
std::atomic<std::uint32_t> underruns_{0};
std::atomic<std::uint32_t> overruns_{0};
std::atomic<std::uint32_t> resyncs_{0};
Expand Down
6 changes: 5 additions & 1 deletion include/srt/detail/kaiser.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,12 @@ inline double kaiserBeta(double attenDb) noexcept {
/// (e.g. 8 kHz transition at 48 kHz -> 8000/48000)
/// \return estimated taps per polyphase phase: N = (A - 8) / (2.285 * 2*pi * df)
inline std::size_t estimateTaps(double attenDb, double transWidthNorm) noexcept {
// Clamp pathological inputs (attenDb < 8, non-positive width): the raw
// formula goes negative/infinite there and casting that to size_t is UB.
if (!(transWidthNorm > 0.0))
return 4;
const double n = (attenDb - 8.0) / (2.285 * 2.0 * std::numbers::pi * transWidthNorm);
return static_cast<std::size_t>(std::ceil(n));
return n > 4.0 ? static_cast<std::size_t>(std::ceil(n)) : 4;
}

/// sin(pi x)/(pi x) with the removable singularity handled.
Expand Down
6 changes: 6 additions & 0 deletions include/srt/pi_servo.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,12 @@ class PiServo {
/// step.
void seed(double occPlusMu) noexcept { lpFast_ = q1_ = q2_ = q3_ = occPlusMu; }

/// Move the occupancy setpoint. The integrator (ppm estimate) is kept and
/// the smoothers are left tracking the real observable, so the loop slews
/// to the new setpoint at its clamped rate with no transient discontinuity
/// — used by the converter's adaptive pull-block setpoint raise.
void setTarget(double targetFrames) noexcept { target_ = targetFrames; }

/// One control update; call once per pull() before synthesis.
/// \param occFrames raw backlog in frames (FIFO + staged frames)
/// \param mu current fractional read position; occ + mu changes
Expand Down
10 changes: 8 additions & 2 deletions include/srt/polyphase_filter.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -331,8 +331,9 @@ inline void dotRowsFrameMajor(const typename SampleTraits<S>::Coeff* SRT_RESTRIC

/// Streaming fractional-delay engine for one converter instance.
///
/// Owns the per-channel history delay lines (planar, contiguous windows with
/// periodic compaction) and the phase accumulator mu. Input frames are pulled
/// Owns the history delay lines (planar per-channel below the
/// channel-parallel threshold, frame-major above it — see the hist_
/// field) and the phase accumulator mu. Input frames are pulled
/// through a caller-supplied PopFn in small bulk chunks and deinterleaved into
/// the histories as the integer read position advances.
///
Expand Down Expand Up @@ -404,6 +405,11 @@ class FractionalResampler {
/// the number produced; fewer than maxFrames means the source ran dry
/// (underrun). RT-safe: no allocation, locks or exceptions.
///
/// Preconditions (the converter upholds both; direct users must too):
/// a successful prime() before the first process() — the window math
/// underflows otherwise — and reset()+reprime after any dry return, as
/// a dry advance==2 slip leaves history and phase one frame apart.
///
/// PopFn: std::size_t popFrames(S* dst, std::size_t maxFrames) — bulk-pops
/// interleaved frames, returning the count actually delivered.
template <typename PopFn>
Expand Down
9 changes: 5 additions & 4 deletions include/srt/sample_traits.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -128,10 +128,11 @@ struct SampleTraits<std::int16_t> {
}

static Coeff blend(Coeff a, Coeff b, BlendFactor fr) noexcept {
// Q14 + (Q15 * Q14) >> 15, in int64: the int32 product would fit
// today's coefficients (fr <= 32767 by construction), but only with
// ~5% margin against a worst-case adjacent-phase delta — not worth
// the silent invariant. One smull on 32-bit cores.
// Q14 + (Q15 * Q14) >> 15, in int64: the worst-case int32 product
// 32767 * 65535 = 2,147,385,345 sits 0.005% under INT32_MAX —
// real adjacent-phase deltas are tiny (|diff| <= 41 measured on the
// transparent table), but a margin that thin is not an invariant
// worth relying on silently. One smull on 32-bit cores.
const std::int64_t diff = static_cast<std::int64_t>(b) - a;
return static_cast<Coeff>(a + ((fr * diff) >> 15));
}
Expand Down
2 changes: 2 additions & 0 deletions include/srt/spsc_ring.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ namespace srt {
template <typename T>
class SpscRing {
static_assert(std::is_trivially_copyable_v<T>);
// The lock-free claim of the whole audio path rests on these indices.
static_assert(std::atomic<std::size_t>::is_always_lock_free);

public:
/// Allocates the buffer; capacity is rounded up to a power of two.
Expand Down
1 change: 1 addition & 0 deletions tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ add_executable(srt_tests
test_asrc_quality.cpp
test_asrc_quality_16k.cpp
test_fade.cpp
test_hardening.cpp
test_latency.cpp
test_multichannel.cpp)
target_link_libraries(srt_tests PRIVATE
Expand Down
2 changes: 1 addition & 1 deletion tests/bare_metal_main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ int main() {
::testing::GTEST_FLAG(filter) = "-AsrcQuality*:AsrcLock.*:Servo.*:Kaiser.*MeetsSpec:"
"FixedPoint.AsrcQuality*:"
"FixedPoint.FullScaleSineDoesNotWrapQ15:"
"MultiChannel.*";
"MultiChannel.*:Feasibility.*:Reset.*";
::testing::InitGoogleTest();
const int rc = RUN_ALL_TESTS();
// CTest's pass criterion: printed only if we get all the way here, so a
Expand Down
Loading
Loading