Our models, vetted first
The only scores on this board are ours — and they're real.
Every number below is ours — measured, never borrowed. We are not publishing a competitor leaderboard from numbers we did not measure. We grade ourselves first, in the open — and we label which scores are sealed and which are still candidates.
A signed score links to its model passport, where the Ed25519 signature is re-checked live against the recorded probe outputs. Candidate rows show the target we're training toward, and in-training checkpoints are shown for transparency — but neither is sealed yet, so they carry no passport to re-check and are never linked to a verifier.
ModelReal sub-scores (from the hard suite)Headline
SprintLoop-32BCANDIDATE
32B · in-boundaryRefusal (SoD / injection) 100%Governance discipline 89%Code correctness 83%
SprintLoop-7B · v6SIGNED
7B · in-boundaryRefusal (SoD / injection) 100%PHI / citation discipline 89.3%Code correctness 83.3%
HealthNext-Care-32BIN TRAINING
32B · health-adminGovernance discipline 71.4%Refusal under pressure 75%Code correctness in training
SprintLoop-32B. Quality-tier candidate — the 90 is our target on the hard suite (code + governance + refusal). Trained, not yet sealed: no signed release behind it, so the score is shown as a target, not a verifiable claim. Re-verifiable artifact: artifacts/models/sprintloop-32b
SprintLoop-7B · v6. Promoted through the release gate. n=50 hard probes. Try it live on /playground. Re-verifiable artifact: artifacts/models/sprintloop-7b/v6
HealthNext-Care-32B. Mid-training checkpoint — shown for transparency, not as a shippable score. Will not be promoted until it clears the gate. Re-verifiable artifact: artifacts/models/healthnext-care-32b/v1-trough-ckpt125