Skip to content

perf: reuse one aiobotocore client per credential set#102

Open
ServerSideHannes wants to merge 3 commits into
mainfrom
fix/reuse-s3-client
Open

perf: reuse one aiobotocore client per credential set#102
ServerSideHannes wants to merge 3 commits into
mainfrom
fix/reuse-s3-client

Conversation

@ServerSideHannes

Copy link
Copy Markdown
Owner

What

Reuse one long-lived aiobotocore client per credential set instead of creating one per request.

Why (memray)

memray on a live pod showed the per-request client pattern generates ~9.5M allocations in _create_connector plus repeated load_default_certs / SSLContext.__new__ — every request built a fresh aiohttp connector and SSLContext (loading the whole CA store). aiobotocore clients are pool-safe and meant to be long-lived.

Change

  • Cache clients keyed by (endpoint, access_key, secret_key, region); create once (double-checked lock), reuse across requests.
  • close_cached_clients() on app shutdown.

Impact

Large CPU/allocation win (no connector/SSLContext/CA-load per request). Memory win is modest (~5MB) — the SSL contexts were high-churn but low-resident, so this is not the OOM fix (that's #101). Shipping as a correctness/efficiency improvement: per-request clients are an anti-pattern.

Tests: test_client_reuse.py pins same-creds reuse, distinct-creds isolation, and shutdown teardown. Full unit suite green.

Diagnostic to find what actually holds the resident memory under backup load
(the OOM), instead of inferring. Enabled only when S3PROXY_TRACEMALLOC is set
(zero overhead otherwise): starts tracemalloc at startup and logs the top live
Python allocations (size + call site) every S3PROXY_TRACEMALLOC_INTERVAL secs
and on SIGUSR1. Chart gains an extraConfig passthrough so one replica can set
the flag via values; revert after capture.
Turn the tracemalloc diagnostic into a proper memory debug mode. The OOM's
defining trait is the gap between real RSS (what the kernel kills on) and the
Python-tracked heap: prod hit ~957MB RSS with only ~87MB tracked, i.e. the
memory lived in C-level buffers (uvicorn/httptools sockets, allocator), which no
top-allocations list can explain. So every interval the mode now logs RSS,
tracked, untracked (rss-tracked) and the governor's active bytes side by side,
then the top live Python allocations. One dump tells us which world we're in:
- large untracked gap  -> C-level (transport buffers), not a Python call site
- small gap            -> Python, and the top list names the exact line

Gated by S3PROXY_MEMORY_DEBUG (alias S3PROXY_TRACEMALLOC), zero overhead when
unset; dumps every S3PROXY_MEMORY_DEBUG_INTERVAL secs and on SIGUSR1.
Creating a client per request built a fresh aiohttp connector + SSLContext that
loads the whole CA store each time -- memray showed ~9.5M allocations in
_create_connector and repeated load_default_certs. Clients are pool-safe and
meant to be long-lived, so cache one per (endpoint, key, region) and close them
on shutdown. Big CPU/allocation win; modest (~5MB) memory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant