The pipeline
Every task moves through the same stages, in order. You do not have to finish all of them. Most teams stop after the optimized prompt and ship that. The task hub mirrors this exactly. The one active stage is always your next step, and each stage tells you what unlocks the next.Two ways to get data in
Every task starts with examples, and there are two doors. Pick the one that fits how you work.Upload a dataset
Fastest path to a score. You already have a CSV or JSONL of inputs and the outputs you want back. Upload it and optimize in the same step.
Connect the SDK
You have a live app. Wire the
runapprentice SDK once and it captures real requests as examples over time. Best for learning from production traffic.What “verified” means
Every row carries a tier. The tier decides what a row is allowed to do, so a score is never built on data no one checked. This split is why the two stages unlock at different points. Optimization runs on verified rows, which is gold plus silver. Training runs on gold only. Optimizing improves the prompt, so silver is trusted enough. Training and promoting a model is a quality claim, so it uses human-verified rows alone. For the full reasoning, see data tiers.You never get a number Apprentice cannot back up. Scores are measured on rows the optimizer never saw, and a run that does not improve is reported as it is, not hidden. A real result you can trust beats a flattering one you cannot.
Where to go next
JSON extraction, end to end
Run the whole pipeline on a real dataset and pull the optimized prompt back into your code.
Why replacement is eval-gated
The reasoning behind the model swap: switched only when your evals prove no quality regression.