Platform · Train & Fine-tune
Adapter and full fine-tunes, inside your boundary.
Teach an open foundation your domain without sending a single row outside your control. Pick the adapter method that fits the base and the budget — LoRA, QLoRA, DoRA, GaLore — run a reproducible recipe, and walk away with a versioned adapter, an eval, and a signed record.
Adapter tuning
You rarely need a new model — you need the base to learn your domain
Full fine-tuning rewrites every weight, costs the most, and produces a fresh multi-gigabyte checkpoint you now have to govern. Adapter methods learn the delta instead: a small, swappable set of parameters that ride on a frozen base. Same behavior change, a fraction of the cost and the surface area to defend.
On the SprintLoop track, a full fine-tune of the 7B base scored 34/36 — no better than LoRA at the same data — so it was reverted. The adapter was the right tool, and the registry recorded the experiment either way.
LoRA
Freeze the base, train a pair of small low-rank matrices into each attention projection. You ship a ~1–2 GB adapter instead of a new copy of the model, and you can stack or swap adapters at serve time. This is what SprintLoop-7B and 32B run on today.
QLoRA
Hold the frozen base in 4-bit, train the LoRA adapters in full precision on top. It is what lets a 32B base fine-tune on a single Apple Silicon box — the 8-bit 32B candidate was trained this way. The tradeoff is a small precision cost on the frozen weights, not the learned ones.
DoRA
Weight-decomposed LoRA. It splits each weight into magnitude and direction and adapts them separately, recovering quality that plain LoRA leaves on the table — at roughly the same memory and adapter size. Reach for it when LoRA plateaus below where you need to be.
GaLore
Gradient low-rank projection. Instead of constraining the weights to low rank, it projects the gradients — so every parameter still moves, but the optimizer state stays small. It targets full fine-tune quality at memory closer to LoRA, for cases where adapter rank is the limiting factor.
At a glance
Reproducible recipe
A run you can re-run and get the same model
Reproducibility is not a nice-to-have when you have to defend a model in front of an auditor. Every run pins its base, method, hyperparameters, and corpus snapshot, and tracks the validation-loss minimum so promotion is a decision, not a guess.
SprintLoop-7B trains with MLX on Apple Silicon against the 2,800-example corpus — no customer data, ever. The recipe converges to a val-loss minimum of 0.69 around iteration 20; the checkpoint at that minimum is the one that becomes v3.
The same recipe on an 8-bit 32B base via QLoRA reaches a lower val-loss of 0.49 — a real optimization win that the harder benchmark then has to confirm before the bigger model earns the production channel.
recipe.yaml · representative
<span class="code-comment"># SprintLoop-7B · LoRA fine-tune (MLX)</span> <span class="code-keyword">base</span>: <span class="code-string">Qwen2.5-Coder-7B-Instruct</span> <span class="code-keyword">method</span>: <span class="code-string">lora</span> <span class="code-keyword">backend</span>: <span class="code-string">mlx</span> <span class="code-comment"># Apple Silicon</span> <span class="code-keyword">lora</span>: <span class="code-keyword">rank</span>: 16 <span class="code-keyword">alpha</span>: 32 <span class="code-keyword">dropout</span>: 0.05 <span class="code-keyword">target_modules</span>: <span class="code-string">["q_proj","k_proj","v_proj","o_proj"]</span> <span class="code-keyword">data</span>: <span class="code-keyword">corpus</span>: <span class="code-string">sprintloop-corpus</span> <span class="code-keyword">examples</span>: 2800 <span class="code-keyword">customer_data</span>: <span class="code-keyword">false</span> <span class="code-comment"># enforced at registry</span> <span class="code-keyword">train</span>: <span class="code-keyword">batch_size</span>: 4 <span class="code-keyword">learning_rate</span>: <span class="code-number">1e-4</span> <span class="code-keyword">select_on</span>: <span class="code-string">val_loss_min</span> <span class="code-comment"># 0.69 @ iter 20</span> <span class="code-function">register</span>(<span class="code-keyword">sign</span>=<span class="code-keyword">true</span>, <span class="code-keyword">channel</span>=<span class="code-string">"candidate"</span>)
Alignment
Sharpen behavior after the task is learned
Fine-tuning teaches the domain; alignment teaches the judgment — when to refuse, how to reason, which answer a reviewer would prefer. These run on top of an adapter, not instead of it.
DPO / GRPO
preference + reasoning RL — sharpen refusal behavior
RLAIF / self-rewarding
AI feedback loop — scale alignment without labelers
What you get
A run ends with an artifact you can ship and defend
The output of a training run is not just weights. It is a registry entry that the rest of the lifecycle can act on without re-deriving anything.
A versioned adapter
A pinned artifact tagged to its base and recipe — v0, v2, v3, v5 are all preserved, so the progression is auditable and any version is re-servable.
A tracked val-loss curve
The validation-loss minimum that selected the checkpoint, recorded against corpus size — the evidence that promotion was earned, not assumed.
A signed record
The run's lineage sealed into the registry and ready for the benchmark and release gates — no hand-off, no re-entry, no second source of truth.
SprintLoop-7B · v0 (POC) → 32B v1 · monitor: no regression · 14 cycles · next: harder 40–60 probe benchmark to separate 7B vs 32B
Run a recipe and register the result
Pick a base, choose the adapter method, and let the studio track the run to its val-loss minimum — then carry the versioned adapter straight into benchmark and release.