feat(git): instrument snapshot serve backend and server-side TTFB#349
Draft
worstell wants to merge 1 commit into
Draft
feat(git): instrument snapshot serve backend and server-side TTFB#349worstell wants to merge 1 commit into
worstell wants to merge 1 commit into
Conversation
Add a backend dimension (disk/s3/...) and server-side time-to-first-byte to snapshot serves so snapshot-lookup latency can be attributed to a cache tier and split into cache-open vs first-chunk time. - Tiered.Open annotates the serving tier via an internal X-Cachew-Served-By header; serve handlers read it for the backend label. - New metrics: cachew.git.snapshot_serve_ttfb_seconds and cachew.git.snapshot_cache_open_duration_seconds. - serveReaderFast measures TTFB (sendfile for files, first-Read for stream readers) and snapshot serves/spans carry backend + ttfb attributes. Amp-Thread-ID: https://ampcode.com/threads/T-019ef6a9-a407-7389-bc43-001405e3ae9e Co-authored-by: Amp <amp@ampcode.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Snapshot-lookup latency (the time a blox client waits before a snapshot download begins) is bimodal: small repos see ~ms TTFB while giant repos (e.g. an 8 GB snapshot) see hundreds of seconds. The existing
snapshot_serve_duration_secondsis whole-request and cannot tell us whether the time is spent in cache lookup, waiting on the L2 (S3) backend first chunk, or in the download itself — nor which tier served the bytes.This adds instrumentation to disambiguate, without changing serving behavior:
Tiered.Openannotates the serving tier via an internalX-Cachew-Served-Byheader (disk,s3, ...); serve handlers read it for a low-cardinalitybackendmetric/span label. Not forwarded to clients.cachew.git.snapshot_serve_ttfb_seconds— server-side time-to-first-byte (handler entry → first response byte), by source/backend/repo.cachew.git.snapshot_cache_open_duration_seconds— cache Open (lookup/metadata/reader creation) before streaming, by backend/status/repo.serveReaderFastnow measures TTFB: immediate for sendfile'd files, first-Readfor stream readers (e.g. an S3 range reader whose first Read blocks on the initial chunk).backend,ttfb_seconds, andmirror_head_secondsattributes; existingrecordSnapshotServegains abackendlabel.Once deployed this lets us confirm whether giant-repo lookup latency is dominated by S3 L2 vs disk L1, and by cache-open vs first-chunk time, to pick the right follow-up (e.g. keeping hot giant snapshots on L1).
Validation:
go build ./...,go test ./internal/cache/... ./internal/strategy/git/..., lint clean except pre-existing gosec warnings on untouched lines.