Skip to content

Show full-precision scores; add submissions show --no-code#63

Merged
msaroufim merged 2 commits into
gpu-mode:mainfrom
robobryce:feat/score-precision-and-no-code
Jun 23, 2026
Merged

Show full-precision scores; add submissions show --no-code#63
msaroufim merged 2 commits into
gpu-mode:mainfrom
robobryce:feat/score-precision-and-no-code

Conversation

@robobryce

@robobryce robobryce commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Two small quality-of-life fixes to the submissions views.

1. Full-precision scores in list and show

Scores were formatted with {:.4}, rounding the geomean leaderboard score to 4 decimals. That's lossy enough to make near-tied submissions indistinguishable — two submissions scoring 0.0017318... and 0.0017449... both render as 0.0017, so you can't tell which is faster from the CLI.

Switch to f64::to_string() (via a small format_score helper), which prints the shortest decimal that round-trips to the same f64 — full precision, no trailing-zero noise.

submissions list — before / after:

# before
ID       Leaderboard   File            Time                 GPU(s)   Status   Score
830213   qr_v2         submission.py   2026-06-23T12:00...   B200     done       0.0017
# after
830213   qr_v2         submission.py   2026-06-23T12:00...   B200     done       0.001731805142084383

submissions show 830213 — before / after:

# before
  - leaderboard on B200: passed (score: 0.0017) [secret]
  - leaderboard on B200: passed (score: 0.0017)
# after
  - leaderboard on B200: passed (score: 0.0017448536290123567) [secret]
  - leaderboard on B200: passed (score: 0.001731805142084383)

2. submissions show --no-code

submissions show always prints the full submission code, usually the largest part of the output. Add a --no-code flag to omit it when you only want the metadata and per-run scores. Default behavior is unchanged.

popcorn-cli submissions show <id> --no-code

Verification

Built the binary and ran both against the live API: list and show print full-precision scores; show --no-code prints metadata + runs and omits the code block (default show still prints it).

  • cargo fmt --all -- --check — clean
  • cargo clippy --all-targets --all-features -- -D warnings — clean
  • cargo test — 31 passed (added format_score + truncate unit tests for patch coverage)

🤖 Generated with Claude Code

Two small quality-of-life fixes to the submissions views:

- Print scores at full f64 precision in both `submissions list` and
  `submissions show`. They were formatted with `{:.4}`, which rounds the
  geomean leaderboard score to 4 decimals (e.g. 0.0017 for two distinct
  submissions that actually scored 0.0017318 vs 0.0017449) — enough to
  make near-tied submissions indistinguishable. `f64::to_string()` emits
  the shortest decimal that round-trips, so no rounding and no trailing
  zero noise.

- Add a `--no-code` flag to `submissions show` to omit the (often large)
  code block, for when you only want the metadata and per-run scores.
  Default behavior is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 25.00000% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/cmd/submissions.rs 30.00% 7 Missing ⚠️
src/cmd/mod.rs 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@brycelelbach

Copy link
Copy Markdown
Contributor

codecov/patch is failing you should fix that.

@brycelelbach

Copy link
Copy Markdown
Contributor

Your PR description does not have a before and after example of submissions list only submissions shown.

@brycelelbach

Copy link
Copy Markdown
Contributor

Mark/other humans: I think --no-code or something similar is important for agents to avoid context blow out. I am not sure what the default should be. The best default for agents would be to not show code by default and require a --code flag to opt-in. However, that would be a breaking change, so I opted for --no-code instead. Thoughts?

Pull the score-rendering logic (shared by `list` and `show`) into a
`format_score` helper and cover it with unit tests, so the precision
change has patch coverage (codecov/patch was 0%). Also adds a small
`truncate` test. No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@robobryce

Copy link
Copy Markdown
Contributor Author

Fixed in 32f4ea5. I pulled the score-rendering logic (shared by list and show) into a format_score helper and added unit tests for it (plus a truncate test). codecov/patch's target was 4.09%, and the new tests cover the bulk of the changed lines, so patch coverage should clear it now. The --no-code branch and the clap wiring in mod.rs are still exercised only by the live show path (the cmd/ layer has no unit tests today), but the meaningful new logic is now covered.

@robobryce

Copy link
Copy Markdown
Contributor Author

Added a submissions list before/after to the description (e.g. 0.00170.001731805142084383).

@robobryce

Copy link
Copy Markdown
Contributor Author

On the default: I agree code-by-default is the wrong default for agent use (it's the single biggest source of context blowout in show). My take, FWIW:

  • This PR keeps --no-code (opt-out) to stay non-breaking, which I think is the right move for this PR.
  • If we're willing to take one breaking change, I'd actually prefer default to no code, add --code to opt in — it's the better default for both humans (who usually want the metadata/scores) and agents. Could pair it with a deprecation cycle: add --code now, keep printing code by default but print a "code shown by default is deprecated, pass --code" notice to stderr, then flip the default in a later release.
  • A lighter-touch alternative: keep code-by-default for humans but auto-suppress when stdout isn't a TTY (agents/pipes), which avoids a hard breaking change. Happy to go whichever way you and Mark prefer — I can adjust this PR or split the default-flip into a follow-up.

@msaroufim msaroufim merged commit 4e56a19 into gpu-mode:main Jun 23, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants