Add Brev Nsight Compute profiling flow#64
Conversation
|
Don't hardcode /home/shadeform; different Brev providers have different default users. |
|
How do you set up the SSH tunnel to the Brev machine? Are you copying an SSH key yourself, or are you using the Brev CLI commands? If you're using the Brev CLI commands, you should always do brev refresh -> regular ssh/scp because brev shell/brev copy will do brev refresh every time and can be slow. I would also probably just set up the SSH key yourself. |
|
Do we run the submissions in a container on the Brev node? There is always a risk that someone could write a malicious submission to try and grab other people's submissions. Containers will protect a little bit from that (obviously not entirely, there's plenty of ways to get out of containers). |
|
Paused the service now, security needs a bit more thought |
4378863 to
8fc14b1
Compare
Summary
Adds
popcorn submit ... --profile-brevsupport for the hosted QR v2 Nsight Compute profiler.The default downloaded artifacts are now the agent-readable Nsight Compute details files:
ncu-details.txtncu-details.csvThe full
profile.ncu-repGUI report is still downloaded when the service returns it, but it is treated as optional and printed after the details links. The final line remains a macOS command that opens the report in Nsight Compute:How It Works
popcorn-cliuploads the existingsubmission.pyunchanged to the hosted profiler service.reference-kernelscheckout on the Brev profiler node.benchmarks:in the matchingtask.yml;--benchmark-index Nprofiles one benchmark shape, and omitting it profiles all benchmark shapes.--benchmark-index 0currently resolves tobatch: 20, n: 32, cond: 1, seed: 43214.mode=profile, exports Nsight Compute details text/CSV, bundles those plus the optional.ncu-rep, and returns the zip to the CLI.popcorn-cliextracts details files first and prints clickable terminal links for them before the optional Nsight Compute GUI report link.Related Infra
This PR only changes
popcorn-cli. The standalone profiler service lives ingpu-mode/ncu-serviceand is currently exposed through the hosted proxy URL documented here.No
kernelbotorkernelboardchanges are part of this PR.Validation
cargo fmt --checkcargo testThe run produced:
profile.0-batch-20-n-32-cond-1-seed-43214/ncu-details.txtprofile.0-batch-20-n-32-cond-1-seed-43214/ncu-details.csvprofile.0-batch-20-n-32-cond-1-seed-43214/profile.ncu-repReference-Kernels Provenance
The Brev profiler node is using
gpu-mode/reference-kernelsmainat:with local profile-mode patches for QR evaluators while the corresponding
reference-kernelsPR is pending.