Skip to content

fix(snapshot): bound S3 parallelGet and make snapshot serving range-aware#356

Draft
worstell wants to merge 1 commit into
mainfrom
fix/snapshot-range-aware-bound-parallelget
Draft

fix(snapshot): bound S3 parallelGet and make snapshot serving range-aware#356
worstell wants to merge 1 commit into
mainfrom
fix/snapshot-range-aware-bound-parallelget

Conversation

@worstell

@worstell worstell commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Two defense-in-depth fixes for snapshot serving heap pressure, plus the metadata wiring ranged clients need.

The S3 full-object download path (parallelGet) let workers read every chunk into memory ahead of a slow consumer, so a single large snapshot served to a slow client could buffer the whole object in the Go heap. A live Datadog heap profile during the incident showed (*S3).parallelGet.func1 holding ~86% of live Go heap. This bounds in-flight chunks with a window token (acquired before fetch, released once the writer consumes the chunk), so peak memory stays at numWorkers × chunkSize.

The git snapshot endpoint also ignored client Range headers, so it always served the full object via that unbounded path. It now forwards Range/If-Range to the cache on the cache-hit path, returns 206/Content-Range (416 when not satisfiable), and advertises Accept-Ranges: bytes. This lets clients fetch with bounded parallel range requests (client.ParallelGet) instead of one full GET.

The snapshot freshen metadata (X-Cachew-Snapshot-Commit / X-Cachew-Bundle-Url) is now emitted on 206 responses too, factored into a shared setSnapshotMetadataHeaders helper used by both the full and ranged paths. Without it a ranged client would download the tarball but lose the commit/bundle metadata it needs to apply a delta bundle.

Tests: TestS3ParallelGetMultiChunk (bounded-window reassembly), TestSnapshotGetHonorsRange (206/416/Content-Range and matching commit metadata on partial responses, via a non-*os.File memory cache reader so range support comes from forwarding Range to Open rather than http.ServeContent).

…ware

The S3 full-object download path (parallelGet) let workers read every chunk
into memory ahead of a slow consumer, so a single large snapshot served to a
slow client could buffer the whole object in the Go heap. Bound it with a
window of in-flight chunks (peak now numWorkers x chunkSize).

The git snapshot endpoint also ignored client Range headers, so it always
served the full object via the unbounded path. Forward Range/If-Range to the
cache on the cache-hit path, return 206/Content-Range (416 when not
satisfiable), and advertise Accept-Ranges, so clients can fetch with bounded
parallel range requests (client.ParallelGet) instead of one full GET.

Amp-Thread-ID: https://ampcode.com/threads/T-019ef6fe-55c6-71e8-96dc-ca3ef4301d36
Co-authored-by: Amp <amp@ampcode.com>
@worstell worstell force-pushed the fix/snapshot-range-aware-bound-parallelget branch from 3bb7057 to ad2f09d Compare June 25, 2026 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant