eigh_py: reward hacks fabricating benchmark time (tracking + the timing-layer gap)

## Summary

Red-teaming the `eigh` benchmark harness surfaced a family of reward hacks that leave `custom_kernel` genuinely correct but fabricate the **reported time**, plus the underlying structural gaps. I've opened focused PRs for the gaps that have clean fixes (#159, #160, #161); this issue tracks the rest, where the right remedy is a judgment call I'd rather leave to the maintainers.

All findings below were confirmed on the **live B200 `eigh` leaderboard** (every test submission deleted immediately after its verdict). A demonstration of the most severe one is currently sitting at **rank #1 with a displayed score of `0.000`**.

### Confirmed-accepted reward-hack families and their status

| Family | What it does | Status |
|---|---|---|
| Aggregator underflow | Drives one shape's reported time toward 0 → geomean collapses to `0.000000` | **PR #159** (roofline floor) |
| In-process cache / file replay | Solves once, returns cached result on reused timed calls | **PR #160** (regenerate inputs per iteration) |
| Lazy output (subclass / instance override) | Returns placeholders, defers the real solve into the untimed checker | **PR #161** (reject deferral) |
| **Timer / stats patch** | Leaves the kernel honest but patches `Event.elapsed_time` / `calculate_stats` to report 1/100th the time | *this issue* |
| **Forged result object** | Forges the `Stats` object the timed loop returns to the parent | *this issue* |

### The remaining gap: the reported time is taken on trust

The timing and the stats reduction happen in the same spawned worker that imports the submission, so a submission can reach and tamper with them (directly, or via aliasing / `gc`). `kernelguard` has merged detectors for some of these routes (SinatrasC/kernelguard #277, #278), which helps at the static-scan layer, but:

- those rules are not yet live on the production scanner (a re-test of the aliased-timer hack on 2026-06-28 was still accepted), and
- a static scanner is a pattern chase; the structural fix is to **compute the reported statistic where the submission cannot reach it** — e.g. time and reduce in the parent process from durations captured before the submission is imported, in a namespace the worker doesn't expose.

That structural change is more invasive than the three PRs above (it touches the harness's process/timing model), so I haven't sent it as an unsolicited large PR. If you'd welcome it, I have a working prototype and am happy to open it; alternatively this may be best handled at the `kernelguard` layer once the merged rules deploy. Flagging it so the decision is yours.

### Also: no `guards/` dir

Unlike `qr_v2`, `eigh_py` ships no `guards/` (differential-correctness / invariance) directory, so those defenses don't run here. Worth adding as defense-in-depth.

Happy to provide minimal repros for any of the above.


Family	What it does	Status
Aggregator underflow	Drives one shape's reported time toward 0 → geomean collapses to `0.000000`	PR #159 (roofline floor)
In-process cache / file replay	Solves once, returns cached result on reused timed calls	PR #160 (regenerate inputs per iteration)
Lazy output (subclass / instance override)	Returns placeholders, defers the real solve into the untimed checker	PR #161 (reject deferral)
Timer / stats patch	Leaves the kernel honest but patches `Event.elapsed_time` / `calculate_stats` to report 1/100th the time	this issue
Forged result object	Forges the `Stats` object the timed loop returns to the parent	this issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

eigh_py: reward hacks fabricating benchmark time (tracking + the timing-layer gap) #162

Summary

Confirmed-accepted reward-hack families and their status

The remaining gap: the reported time is taken on trust

Also: no `guards/` dir

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

eigh_py: reward hacks fabricating benchmark time (tracking + the timing-layer gap) #162

Description

Summary

Confirmed-accepted reward-hack families and their status

The remaining gap: the reported time is taken on trust

Also: no guards/ dir

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Also: no `guards/` dir