fix: cap UploadPartCopy pump chunk at MAX_BUFFER_SIZE (concurrent-backup OOM) by ServerSideHannes · Pull Request #101 · ServerSideHannes/s3proxy-python

ServerSideHannes · 2026-07-01T07:21:06Z

What

Cap the UploadPartCopy pump chunk at MAX_BUFFER_SIZE (8MB). This is the actual concurrent-backup OOM root cause — found by running memray on a live prod pod.

Root cause (memray on prod, then reproduced locally)

The top resident allocations on a live pod under backup load were:

64.0 MB  crypto.py:354   (re-encrypt)
64.0 MB  copy.py:398     (chunk buffer)   ← handlers/multipart/copy.py
25.7 MB  copy.py:393

i.e. the scylla dedup UploadPartCopy path, not uploads or GETs (which earlier fixes covered). _pump_copy_chunks sized its buffer from calculate_optimal_part_size, which returns up to 64MB for large sources. So each copy of a large SSTable:

buffers a 64MB chunk, then data = bytes(buf[:chunk_size]) copies it (another 64MB), then re-encrypts it (another 64MB) → ~150–190MB resident per copy,
while the request limiter reserved only copy_pipeline_peak (~32MB — it assumes 8MB streaming).

So a handful of concurrent dedup copies of ≥80MB parts blew past the pod limit even though the governor read well under budget — exactly the prod signature (RSS ~512, governed_active ~48, few in-flight).

Fix

chunk_size = min(calculate_optimal_part_size(...), MAX_BUFFER_SIZE) — copies now stream in 8MB chunks like every other path, matched to their reservation.

Proof (local, prod config 512Mi / 64MB budget)

Concurrent UploadPartCopy of 90–120MB sources + GETs:

	peak RSS
before	511.9 MiB (the wall)
after	321 MiB

Streaming-copy round-trip tests pass; new test_copy_chunk_bounded.py pins the invariant.

Why this was missed until now

Earlier repros used CopyObject (whole-object path) and single-load-type floods, which the governor bounds to ~300MB locally. The prod driver is UploadPartCopy of large SSTables under dedup — only visible once memray attributed native/resident memory on a live pod. Stacks on #97 (copy govern), #98 (GET), #99/#100 (concurrency cap + debug mode).

Root cause of the concurrent-backup OOM, found via memray on a live pod: the top resident allocations were copy.py:398 (64MB) + crypto.py:354 (64MB) -- the scylla dedup UploadPartCopy path. _pump_copy_chunks sized its buffer from calculate_optimal_part_size, which returns up to 64MB for large sources. So each copy of a large SSTable buffered a 64MB chunk, copied it (bytes(buf[:chunk_size])) and re-encrypted it (~150-190MB resident) while the request limiter only reserved copy_pipeline_peak (~32MB, it assumes 8MB streaming). A handful of concurrent dedup copies of >=80MB parts therefore blew past the pod memory limit even though the governor read well under budget -- exactly the prod signature (RSS 512, governed ~48). Cap the pump chunk at MAX_BUFFER_SIZE so copies stream in 8MB chunks like every other path and stay matched to their reservation. Reproduced locally at prod config (512Mi/64MB): concurrent UploadPartCopy of 90-120MB sources pinned RSS at 511.9MiB (the wall); with the cap the same load peaks 321MiB. Streaming-copy round-trip tests still pass.

ServerSideHannes mentioned this pull request Jul 1, 2026

perf: reuse one aiobotocore client per credential set #102

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: cap UploadPartCopy pump chunk at MAX_BUFFER_SIZE (concurrent-backup OOM)#101

fix: cap UploadPartCopy pump chunk at MAX_BUFFER_SIZE (concurrent-backup OOM)#101
ServerSideHannes wants to merge 1 commit into
mainfrom
fix/copy-chunk-memory

ServerSideHannes commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ServerSideHannes commented Jul 1, 2026

What

Root cause (memray on prod, then reproduced locally)

Fix

Proof (local, prod config 512Mi / 64MB budget)

Why this was missed until now

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant