perf: reuse one aiobotocore client per credential set by ServerSideHannes · Pull Request #102 · ServerSideHannes/s3proxy-python

ServerSideHannes · 2026-07-01T07:21:22Z

What

Reuse one long-lived aiobotocore client per credential set instead of creating one per request.

Why (memray)

memray on a live pod showed the per-request client pattern generates ~9.5M allocations in _create_connector plus repeated load_default_certs / SSLContext.__new__ — every request built a fresh aiohttp connector and SSLContext (loading the whole CA store). aiobotocore clients are pool-safe and meant to be long-lived.

Change

Cache clients keyed by (endpoint, access_key, secret_key, region); create once (double-checked lock), reuse across requests.
close_cached_clients() on app shutdown.

Impact

Large CPU/allocation win (no connector/SSLContext/CA-load per request). Memory win is modest (~5MB) — the SSL contexts were high-churn but low-resident, so this is not the OOM fix (that's #101). Shipping as a correctness/efficiency improvement: per-request clients are an anti-pattern.

Tests: test_client_reuse.py pins same-creds reuse, distinct-creds isolation, and shutdown teardown. Full unit suite green.

Diagnostic to find what actually holds the resident memory under backup load (the OOM), instead of inferring. Enabled only when S3PROXY_TRACEMALLOC is set (zero overhead otherwise): starts tracemalloc at startup and logs the top live Python allocations (size + call site) every S3PROXY_TRACEMALLOC_INTERVAL secs and on SIGUSR1. Chart gains an extraConfig passthrough so one replica can set the flag via values; revert after capture.

Turn the tracemalloc diagnostic into a proper memory debug mode. The OOM's defining trait is the gap between real RSS (what the kernel kills on) and the Python-tracked heap: prod hit ~957MB RSS with only ~87MB tracked, i.e. the memory lived in C-level buffers (uvicorn/httptools sockets, allocator), which no top-allocations list can explain. So every interval the mode now logs RSS, tracked, untracked (rss-tracked) and the governor's active bytes side by side, then the top live Python allocations. One dump tells us which world we're in: - large untracked gap -> C-level (transport buffers), not a Python call site - small gap -> Python, and the top list names the exact line Gated by S3PROXY_MEMORY_DEBUG (alias S3PROXY_TRACEMALLOC), zero overhead when unset; dumps every S3PROXY_MEMORY_DEBUG_INTERVAL secs and on SIGUSR1.

Creating a client per request built a fresh aiohttp connector + SSLContext that loads the whole CA store each time -- memray showed ~9.5M allocations in _create_connector and repeated load_default_certs. Clients are pool-safe and meant to be long-lived, so cache one per (endpoint, key, region) and close them on shutdown. Big CPU/allocation win; modest (~5MB) memory.

ServerSideHannes added 3 commits July 1, 2026 07:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reuse one aiobotocore client per credential set#102

perf: reuse one aiobotocore client per credential set#102
ServerSideHannes wants to merge 3 commits into
mainfrom
fix/reuse-s3-client

ServerSideHannes commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ServerSideHannes commented Jul 1, 2026

What

Why (memray)

Change

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant