Releases · ServerSideHannes/s3proxy-python

Release list

2026.7.1 Latest

Latest

ServerSideHannes released this 01 Jul 17:19

2026.7.1

bc0bd36

Fixes

COPY of large multipart-encrypted objects failed with InvalidTag (#104). _iter_multipart_plaintext decrypted each whole client part as a single AES-GCM seal, but a client part expands into multiple internal parts, each a sequence of independent frames. Any source whose parts held more than one frame (internal parts >8MB, e.g. ScyllaDB backups) failed to copy. The reader now walks internal parts → frames and decrypts one frame at a time, matching the GET path. Also bounds copy source-read peak memory to O(frame).

Assets 2

2026.6.16

ServerSideHannes released this 01 Jul 05:54

2026.6.16

953bcac

feat: memory debug mode (RSS vs tracked heap + top allocations) (#100)

Diagnostic to pin the s3proxy OOM root cause. Gated by S3PROXY_MEMORY_DEBUG (alias S3PROXY_TRACEMALLOC), zero overhead when unset. Every interval logs real RSS vs Python-tracked heap vs untracked gap vs governor active bytes, then the top live allocations by call site.

One dump settles which world the OOM is in:

large untracked gap -> C-level transport buffers (uvicorn/httptools), fix at HTTP/LB layer
small gap -> Python, top list names the exact line

Usage: extraConfig { S3PROXY_MEMORY_DEBUG: "1" } + raise pod memory to ~1-2Gi so it survives to dump; read MEMORY_DEBUG / MEMORY_DEBUG_TOP under real backup load; revert.

No behavior change unless enabled.

Assets 2

2026.6.15

ServerSideHannes released this 30 Jun 18:14

2026.6.15

98235b5

fix(chart): cap per-pod backend concurrency at the frontproxy (maxconn) (#99)

Stops the upload-side concurrent-backup OOM (dominant cause on 2026.6.14). uvicorn buffers each in-flight request body off the socket before the app's memory limiter runs, so a backup flood piles up bodies in the HTTP server's C-level buffers (governor reads ~64MB while RSS hits 512Mi+ -> OOMKilled). That memory is ungovernable from the app layer.

Fix: haproxy now caps in-flight requests PER pod (maxconn, default 40) and queues the excess (timeout queue) instead of overrunning a pod. Chart values: frontproxy.maxConnPerPod, frontproxy.timeouts.queue.

Verified locally at prod config (512Mi/64MB, 2026.6.14 app): direct 128x16MB PUT flood OOM-killed the pod (exit 137); via haproxy maxconn 40 -> 256/256 ok, peak 335MiB, no OOM. haproxy queues rather than rejects, so clients see success.

Completes the OOM fix set: 2026.6.13 (#97 copy), 2026.6.14 (#98 streaming-GET), 2026.6.15 (#99 upload concurrency cap).

Assets 2

2026.6.14

ServerSideHannes released this 30 Jun 17:33

2026.6.14

33e4d6c

fix: hold GET memory reservation for the whole streaming-response lifetime (#98)

The dominant concurrent-backup OOM. Streaming GET responses released their memory reservation before the body was sent, so concurrent downloads ran ungoverned — each holding an 8MB decrypted frame in the send buffer (N×8MB → OOMKill, exit 137) while the limiter read ~budget.

Fix: hold the reservation for the whole stream lifetime (admission control); drop the now-redundant per-frame acquires.

Verified at prod config (512Mi/64MB): the 90-concurrent multipart GET flood that OOM-killed the pod (0/180) now completes 180/180 at ~325MiB; realistic upload+GET mix 106/106 at ~305MiB. Profiler: tracked memory 812MB→112MB, live frames 90→11.

Stacks on 2026.6.13 (#97, copy crash + copy-OOM).

Assets 2

2026.6.13

ServerSideHannes released this 30 Jun 16:24

2026.6.13

fab3774

fix: govern copy memory + fix passthrough-copy ClientResponse.read crash (#97)

Gate server-side copies (CopyObject / UploadPartCopy) through the memory limiter so a Scylla dedup flood can't OOM the pod (was: ungoverned concurrent decrypt+re-encrypt → exit 137).
Fix _iter_copy_source: body.content.read(n) instead of body.read(n) (aiohttp ClientResponse.read() takes no size arg → every passthrough copy 500'd with TypeError).

Verified locally: 64-concurrent copy flood at a 256MiB cap OOM-killed the pod before (0/64 ok), now peaks ~195MiB with 64/64 ok.

Assets 2

2026.6.12

ServerSideHannes released this 30 Jun 12:13

2026.6.12

381ac2c

Make the s3proxy container's startup/liveness/readiness probes configurable via .Values (defaults unchanged). Lets a deployment raise the liveness timeout so a busy single-event-loop worker is not restarted under upload load (the kill -> retry -> crashloop cascade). App code identical to 2026.6.11.

Assets 2

2026.6.11

ServerSideHannes released this 30 Jun 11:02

2026.6.11

b2343b0

Fix: list responses emit LastModified as RFC3339 Z (millisecond) instead of +00:00. rclone 1.51.0 (scylla-manager-agent) rejected +00:00 with 'cannot parse "+00:00" as "Z"', failing every Scylla backup list. Completes the V1-list fix chain (#91 to #94).

Assets 2

2026.6.10

ServerSideHannes released this 30 Jun 10:14

2026.6.10

0cddafc

Fixes

Route V1 ListObjects to the list handler instead of raw-forwarding (#93). Completes the V1 fix from 2026.6.9. _dispatch_bucket was raw-forwarding any bucket GET without list-type/delete/uploads/location straight to the backend, so a V1 ListObjects (?prefix&delimiter&max-keys&encoding-type, no list-type=2) was sent verbatim to Hetzner → HTTP 400, never reaching the V1→V2 translation added in #92. A bucket GET whose query is only listing params now falls through to the list handler; genuine sub-resource GETs (acl, versioning, …) still forward.

This is the fix that actually unblocks Scylla backups and Postgres retention against Hetzner.

Image: ghcr.io/serversidehannes/s3proxy-python:2026.6.10

Chain: 2026.6.8 (#91 V2 token, #88 parallel HEAD) → 2026.6.9 (#92 V1→V2 in handler) → 2026.6.10 (#93 route V1 to handler).

Assets 2

2026.6.9

ServerSideHannes released this 30 Jun 09:52

2026.6.9

cb04a8b

Fixes

Serve V1 ListObjects via the backend's V2 API (#92). Hetzner Object Storage only implements ListObjectsV2 and rejects legacy V1 ListObjects with HTTP 400. The proxy forwarded V1 client requests as V1, breaking every V1 client: scylla-manager's bundled rclone 1.51.0 (all Scylla backups failed at the list step) and barman-cloud-backup-delete (CNPG retention failed with BadRequest, so Postgres backups completed but old ones were never pruned). handle_list_objects_v1 now calls the backend's list_objects_v2, mapping the client's V1 marker → V2 StartAfter (stateless, lossless for recursive listings) and synthesizing NextMarker from the largest raw backend key when truncated.

Image: ghcr.io/serversidehannes/s3proxy-python:2026.6.9

Builds on 2026.6.8 (V2 continuation-token fix #91, parallel list HEAD #88).

Assets 2

2026.6.8

ServerSideHannes released this 30 Jun 07:38

2026.6.8

86077c6

Fixes

Don't URL-encode V2 continuation tokens under encoding-type=url (#91). ListObjectsV2 continuation tokens are opaque cursors, not keys — the S3 spec only URL-encodes Key/Prefix/Delimiter/StartAfter, and clients never URL-decode the token. The serializer was running NextContinuationToken/ContinuationToken through _encode_key(), so under encoding-type=url a key-shaped backend token (…/data_0007.tar → …%2F…) could not round-trip: the backend never advanced, the same page repeated, and clients aborted with "the same next token was received twice." This wedged CNPG barman-cloud base backups and retention on multi-page catalogs. Now emitted XML-escaped only; V1 NextMarker (a real key) is still URL-encoded.
Parallelize per-object HEAD on list-objects (#88). Resolving the SSE plaintext size/etag did one sequential head_object per key; a recursive list of up to max-keys objects stacked into a multi-second stall that tripped client timeouts and hung ClickHouse/Postgres backups at the S3 list step. HEADs now run concurrently, bounded by LIST_HEAD_CONCURRENCY (50), preserving output order and the per-object fallback to the listed size/etag.

Image: ghcr.io/serversidehannes/s3proxy-python:2026.6.8

Assets 2

Releases: ServerSideHannes/s3proxy-python

Release list

2026.7.1

Fixes

Uh oh!

2026.6.16

Uh oh!

2026.6.15

Uh oh!

2026.6.14

Uh oh!

2026.6.13

Uh oh!

2026.6.12

Uh oh!

2026.6.11

Uh oh!

2026.6.10

Fixes

Uh oh!

2026.6.9

Fixes

Uh oh!

2026.6.8

Fixes

Uh oh!