> ## Documentation Index > Fetch the complete documentation index at: https://docs.runapprentice.com/llms.txt > Use this file to discover all available pages before exploring further. # Apprentice > Cut your LLM cost without losing quality. Optimize a prompt against your own data, then replace the frontier model with a small one only when evals prove it holds. Apprentice does two things, in order: 1. **Optimize the prompt.** Give it a dataset of inputs and correct outputs for one task. It runs prompt optimization (DSPy GEPA) and reports the score change on held-out rows. 2. **Replace the model.** Once you have enough verified data, train a small model to take over from the frontier model. The switch is gated on evals, with instant rollback. This second feature is still being built; pages that describe it are marked **Building**. You start with feature one today. Go from a CSV to an optimized prompt in under ten lines. A full run for the first task class, end to end. Log your production calls with one callback, no code changes. Every method, its parameters, and what it returns. ## What you can prove today The prompt-optimization layer is real and reproducible. On a public JSON extraction set (100 examples, 70 train, 30 held out), GPT-4o-mini went from 83.1 to 85.6 with GEPA, and a fine-tuned Qwen3.5-4B went from 69.1 to 88.9. You can run it yourself: [apprentice-benchmark](https://github.com/singh-abhishekk/apprentice-benchmark). ## How we write these docs Every number, tier, and behavior on this site matches the code. If a feature is not shipped, the page says so. A run that does not improve is reported as a real result, not hidden. If you find a claim that drifts from what the SDK does, it is a bug, tell us.