Skip to content

Add Brev Nsight Compute profiling flow#64

Merged
msaroufim merged 2 commits into
mainfrom
brev-profiler-central-proxy
Jun 25, 2026
Merged

Add Brev Nsight Compute profiling flow#64
msaroufim merged 2 commits into
mainfrom
brev-profiler-central-proxy

Conversation

@msaroufim

@msaroufim msaroufim commented Jun 25, 2026

Copy link
Copy Markdown
Member

Summary

Adds popcorn submit ... --profile-brev support for the hosted QR v2 Nsight Compute profiler.

The default downloaded artifacts are now the agent-readable Nsight Compute details files:

  • ncu-details.txt
  • ncu-details.csv

The full profile.ncu-rep GUI report is still downloaded when the service returns it, but it is treated as optional and printed after the details links. The final line remains a macOS command that opens the report in Nsight Compute:

open -a "NVIDIA Nsight Compute" 'profile.0-.../profile.ncu-rep'

How It Works

  • popcorn-cli uploads the existing submission.py unchanged to the hosted profiler service.
  • The service resolves the requested leaderboard against the active reference-kernels checkout on the Brev profiler node.
  • Shapes come from benchmarks: in the matching task.yml; --benchmark-index N profiles one benchmark shape, and omitting it profiles all benchmark shapes.
  • For QR v2, --benchmark-index 0 currently resolves to batch: 20, n: 32, cond: 1, seed: 43214.
  • The profiler service runs the selected benchmark in mode=profile, exports Nsight Compute details text/CSV, bundles those plus the optional .ncu-rep, and returns the zip to the CLI.
  • popcorn-cli extracts details files first and prints clickable terminal links for them before the optional Nsight Compute GUI report link.

Related Infra

This PR only changes popcorn-cli. The standalone profiler service lives in gpu-mode/ncu-service and is currently exposed through the hosted proxy URL documented here.

No kernelbot or kernelboard changes are part of this PR.

Validation

  • cargo fmt --check
  • cargo test
  • End-to-end QR v2 profile through the hosted service:
POPCORN_BREV_PROFILER_URL=https://http--brev-profiler-proxy--dxfjds728w5v.code.run \
  cargo run -- submit /Users/mark/Dev/reference-kernels/problems/linalg/qr_v2/submission.py \
  --leaderboard qr_v2 \
  --profile-brev \
  --benchmark-index 0 \
  --no-tui

The run produced:

  • profile.0-batch-20-n-32-cond-1-seed-43214/ncu-details.txt
  • profile.0-batch-20-n-32-cond-1-seed-43214/ncu-details.csv
  • profile.0-batch-20-n-32-cond-1-seed-43214/profile.ncu-rep

Reference-Kernels Provenance

The Brev profiler node is using gpu-mode/reference-kernels main at:

e224fc20c430fce369dd9c072c8adfe1ad4d8a06

with local profile-mode patches for QR evaluators while the corresponding reference-kernels PR is pending.

@brycelelbach

Copy link
Copy Markdown
Contributor

Don't hardcode /home/shadeform; different Brev providers have different default users.

@brycelelbach

Copy link
Copy Markdown
Contributor

How do you set up the SSH tunnel to the Brev machine? Are you copying an SSH key yourself, or are you using the Brev CLI commands?

If you're using the Brev CLI commands, you should always do brev refresh -> regular ssh/scp because brev shell/brev copy will do brev refresh every time and can be slow.

I would also probably just set up the SSH key yourself.

@brycelelbach

Copy link
Copy Markdown
Contributor

Do we run the submissions in a container on the Brev node? There is always a risk that someone could write a malicious submission to try and grab other people's submissions. Containers will protect a little bit from that (obviously not entirely, there's plenty of ways to get out of containers).

@msaroufim

Copy link
Copy Markdown
Member Author

Paused the service now, security needs a bit more thought

@msaroufim msaroufim force-pushed the brev-profiler-central-proxy branch from 4378863 to 8fc14b1 Compare June 25, 2026 01:16
@msaroufim msaroufim merged commit d36525e into main Jun 25, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants