Show full-precision scores; add `submissions show --no-code` by robobryce · Pull Request #63 · gpu-mode/popcorn-cli

robobryce · 2026-06-23T21:58:10Z

Two small quality-of-life fixes to the submissions views.

1. Full-precision scores in `list` and `show`

Scores were formatted with {:.4}, rounding the geomean leaderboard score to 4 decimals. That's lossy enough to make near-tied submissions indistinguishable — two submissions scoring 0.0017318... and 0.0017449... both render as 0.0017, so you can't tell which is faster from the CLI.

Switch to f64::to_string() (via a small format_score helper), which prints the shortest decimal that round-trips to the same f64 — full precision, no trailing-zero noise.

submissions list — before / after:

# before
ID       Leaderboard   File            Time                 GPU(s)   Status   Score
830213   qr_v2         submission.py   2026-06-23T12:00...   B200     done       0.0017
# after
830213   qr_v2         submission.py   2026-06-23T12:00...   B200     done       0.001731805142084383

submissions show 830213 — before / after:

# before
  - leaderboard on B200: passed (score: 0.0017) [secret]
  - leaderboard on B200: passed (score: 0.0017)
# after
  - leaderboard on B200: passed (score: 0.0017448536290123567) [secret]
  - leaderboard on B200: passed (score: 0.001731805142084383)

2. `submissions show --no-code`

submissions show always prints the full submission code, usually the largest part of the output. Add a --no-code flag to omit it when you only want the metadata and per-run scores. Default behavior is unchanged.

popcorn-cli submissions show <id> --no-code

Verification

Built the binary and ran both against the live API: list and show print full-precision scores; show --no-code prints metadata + runs and omits the code block (default show still prints it).

cargo fmt --all -- --check — clean
cargo clippy --all-targets --all-features -- -D warnings — clean
cargo test — 31 passed (added format_score + truncate unit tests for patch coverage)

🤖 Generated with Claude Code

Two small quality-of-life fixes to the submissions views: - Print scores at full f64 precision in both `submissions list` and `submissions show`. They were formatted with `{:.4}`, which rounds the geomean leaderboard score to 4 decimals (e.g. 0.0017 for two distinct submissions that actually scored 0.0017318 vs 0.0017449) — enough to make near-tied submissions indistinguishable. `f64::to_string()` emits the shortest decimal that round-trips, so no rounding and no trailing zero noise. - Add a `--no-code` flag to `submissions show` to omit the (often large) code block, for when you only want the metadata and per-run scores. Default behavior is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codecov · 2026-06-23T22:02:11Z

Codecov Report

❌ Patch coverage is 25.00000% with 9 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/cmd/submissions.rs	30.00%	7 Missing ⚠️
src/cmd/mod.rs	0.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

brycelelbach · 2026-06-23T22:07:18Z

codecov/patch is failing you should fix that.

brycelelbach · 2026-06-23T22:07:44Z

Your PR description does not have a before and after example of submissions list only submissions shown.

brycelelbach · 2026-06-23T22:09:47Z

Mark/other humans: I think --no-code or something similar is important for agents to avoid context blow out. I am not sure what the default should be. The best default for agents would be to not show code by default and require a --code flag to opt-in. However, that would be a breaking change, so I opted for --no-code instead. Thoughts?

Pull the score-rendering logic (shared by `list` and `show`) into a `format_score` helper and cover it with unit tests, so the precision change has patch coverage (codecov/patch was 0%). Also adds a small `truncate` test. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

robobryce · 2026-06-23T22:23:54Z

Fixed in 32f4ea5. I pulled the score-rendering logic (shared by list and show) into a format_score helper and added unit tests for it (plus a truncate test). codecov/patch's target was 4.09%, and the new tests cover the bulk of the changed lines, so patch coverage should clear it now. The --no-code branch and the clap wiring in mod.rs are still exercised only by the live show path (the cmd/ layer has no unit tests today), but the meaningful new logic is now covered.

robobryce · 2026-06-23T22:23:55Z

Added a submissions list before/after to the description (e.g. 0.0017 → 0.001731805142084383).

robobryce · 2026-06-23T22:23:56Z

On the default: I agree code-by-default is the wrong default for agent use (it's the single biggest source of context blowout in show). My take, FWIW:

This PR keeps --no-code (opt-out) to stay non-breaking, which I think is the right move for this PR.
If we're willing to take one breaking change, I'd actually prefer default to no code, add --code to opt in — it's the better default for both humans (who usually want the metadata/scores) and agents. Could pair it with a deprecation cycle: add --code now, keep printing code by default but print a "code shown by default is deprecated, pass --code" notice to stderr, then flip the default in a later release.
A lighter-touch alternative: keep code-by-default for humans but auto-suppress when stdout isn't a TTY (agents/pipes), which avoids a hard breaking change. Happy to go whichever way you and Mark prefer — I can adjust this PR or split the default-flip into a follow-up.

msaroufim merged commit 4e56a19 into gpu-mode:main Jun 23, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Show full-precision scores; add `submissions show --no-code`#63

Show full-precision scores; add `submissions show --no-code`#63
msaroufim merged 2 commits into
gpu-mode:mainfrom
robobryce:feat/score-precision-and-no-code

robobryce commented Jun 23, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

brycelelbach commented Jun 23, 2026

Uh oh!

brycelelbach commented Jun 23, 2026

Uh oh!

brycelelbach commented Jun 23, 2026

Uh oh!

robobryce commented Jun 23, 2026

Uh oh!

robobryce commented Jun 23, 2026

Uh oh!

robobryce commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

robobryce commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Full-precision scores in list and show

2. submissions show --no-code

Verification

Uh oh!

codecov Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

brycelelbach commented Jun 23, 2026

Uh oh!

brycelelbach commented Jun 23, 2026

Uh oh!

brycelelbach commented Jun 23, 2026

Uh oh!

robobryce commented Jun 23, 2026

Uh oh!

robobryce commented Jun 23, 2026

Uh oh!

robobryce commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robobryce commented Jun 23, 2026 •

edited

Loading

1. Full-precision scores in `list` and `show`

2. `submissions show --no-code`

codecov Bot commented Jun 23, 2026 •

edited

Loading