Platform · Train & Fine-tune

Adapter and full fine-tunes, inside your boundary.

Teach an open foundation your domain without sending a single row outside your control. Pick the adapter method that fits the base and the budget — LoRA, QLoRA, DoRA, GaLore — run a reproducible recipe, and walk away with a versioned adapter, an eval, and a signed record.

Open the studio Compare the methods

34/36

Best 7B score on the 18-probe suite (v3)

0.69

Val-loss minimum at iter 20 (7B)

0.49

Val-loss minimum on the 32B candidate

Customer data rows in the corpus

Adapter tuning

You rarely need a new model — you need the base to learn your domain

Full fine-tuning rewrites every weight, costs the most, and produces a fresh multi-gigabyte checkpoint you now have to govern. Adapter methods learn the delta instead: a small, swappable set of parameters that ride on a frozen base. Same behavior change, a fraction of the cost and the surface area to defend.

On the SprintLoop track, a full fine-tune of the 7B base scored 34/36 — no better than LoRA at the same data — so it was reverted. The adapter was the right tool, and the registry recorded the experiment either way.

Default for task adaptation

LoRA

memory Lowquality Strong

Freeze the base, train a pair of small low-rank matrices into each attention projection. You ship a ~1–2 GB adapter instead of a new copy of the model, and you can stack or swap adapters at serve time. This is what SprintLoop-7B and 32B run on today.

Big base on one machine

QLoRA

memory Lowestquality Strong

Hold the frozen base in 4-bit, train the LoRA adapters in full precision on top. It is what lets a 32B base fine-tune on a single Apple Silicon box — the 8-bit 32B candidate was trained this way. The tradeoff is a small precision cost on the frozen weights, not the learned ones.

LoRA underfits the task

DoRA

memory ≈ LoRAquality Higher fidelity

Weight-decomposed LoRA. It splits each weight into magnitude and direction and adapts them separately, recovering quality that plain LoRA leaves on the table — at roughly the same memory and adapter size. Reach for it when LoRA plateaus below where you need to be.

You want full-FT quality at adapter memory

GaLore

memory Lowquality Full-rank

Gradient low-rank projection. Instead of constraining the weights to low rank, it projects the gradients — so every parameter still moves, but the optimizer state stays small. It targets full fine-tune quality at memory closer to LoRA, for cases where adapter rank is the limiting factor.

At a glance

LoRA — memory / quality / artifactlow · strong · ~1–2 GB adapter

QLoRA — memory / quality / artifactlowest · strong · 4-bit base + adapter

DoRA — memory / quality / artifact≈ LoRA · higher fidelity · adapter

GaLore — memory / quality / artifactlow · full-rank · merged weights

Reproducible recipe

A run you can re-run and get the same model

Reproducibility is not a nice-to-have when you have to defend a model in front of an auditor. Every run pins its base, method, hyperparameters, and corpus snapshot, and tracks the validation-loss minimum so promotion is a decision, not a guess.

SprintLoop-7B trains with MLX on Apple Silicon against the 2,800-example corpus — no customer data, ever. The recipe converges to a val-loss minimum of 0.69 around iteration 20; the checkpoint at that minimum is the one that becomes v3.

The same recipe on an 8-bit 32B base via QLoRA reaches a lower val-loss of 0.49 — a real optimization win that the harder benchmark then has to confirm before the bigger model earns the production channel.

BaseQwen2.5-Coder-7B-Instruct

MethodLoRA · rank 16 · MLX

Corpussprintloop-corpus · 2,800 verified

Customer datanone

Val-loss minimum0.69 @ iter 20 (v3)

Best 7B score34 / 36 · 18-probe suite

Channelpromoted to production

recipe.yaml · representative

<span class="code-comment"># SprintLoop-7B · LoRA fine-tune (MLX)</span>
<span class="code-keyword">base</span>:     <span class="code-string">Qwen2.5-Coder-7B-Instruct</span>
<span class="code-keyword">method</span>:   <span class="code-string">lora</span>
<span class="code-keyword">backend</span>:  <span class="code-string">mlx</span>          <span class="code-comment"># Apple Silicon</span>

<span class="code-keyword">lora</span>:
  <span class="code-keyword">rank</span>:          16
  <span class="code-keyword">alpha</span>:         32
  <span class="code-keyword">dropout</span>:       0.05
  <span class="code-keyword">target_modules</span>: <span class="code-string">["q_proj","k_proj","v_proj","o_proj"]</span>

<span class="code-keyword">data</span>:
  <span class="code-keyword">corpus</span>:        <span class="code-string">sprintloop-corpus</span>
  <span class="code-keyword">examples</span>:      2800
  <span class="code-keyword">customer_data</span>: <span class="code-keyword">false</span>      <span class="code-comment"># enforced at registry</span>

<span class="code-keyword">train</span>:
  <span class="code-keyword">batch_size</span>:    4
  <span class="code-keyword">learning_rate</span>: <span class="code-number">1e-4</span>
  <span class="code-keyword">select_on</span>:     <span class="code-string">val_loss_min</span>   <span class="code-comment"># 0.69 @ iter 20</span>

<span class="code-function">register</span>(<span class="code-keyword">sign</span>=<span class="code-keyword">true</span>, <span class="code-keyword">channel</span>=<span class="code-string">"candidate"</span>)

Alignment

Sharpen behavior after the task is learned

Fine-tuning teaches the domain; alignment teaches the judgment — when to refuse, how to reason, which answer a reviewer would prefer. These run on top of an adapter, not instead of it.

Available

DPO / GRPO

preference + reasoning RL — sharpen refusal behavior

Planned

RLAIF / self-rewarding

AI feedback loop — scale alignment without labelers

What you get

A run ends with an artifact you can ship and defend

The output of a training run is not just weights. It is a registry entry that the rest of the lifecycle can act on without re-deriving anything.

A versioned adapter

A pinned artifact tagged to its base and recipe — v0, v2, v3, v5 are all preserved, so the progression is auditable and any version is re-servable.

A tracked val-loss curve

The validation-loss minimum that selected the checkpoint, recorded against corpus size — the evidence that promotion was earned, not assumed.

A signed record

The run's lineage sealed into the registry and ready for the benchmark and release gates — no hand-off, no re-entry, no second source of truth.

SprintLoop-7B · v0 (POC) → 32B v1 · monitor: no regression · 14 cycles · next: harder 40–60 probe benchmark to separate 7B vs 32B

Run a recipe and register the result

Pick a base, choose the adapter method, and let the studio track the run to its val-loss minimum — then carry the versioned adapter straight into benchmark and release.

Open the studio See the full lifecycle