> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runapprentice.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Apprentice

> Cut your LLM cost without losing quality. Optimize a prompt against your own data, then replace the frontier model with a small one only when evals prove it holds.

Apprentice does two things, in order:

1. **Optimize the prompt.** Give it a dataset of inputs and correct outputs for one task. It runs prompt optimization (DSPy GEPA) and reports the score change on held-out rows.
2. **Replace the model.** Once you have enough verified data, train a small model to take over from the frontier model. The switch is gated on evals, with instant rollback. This second feature is still being built; pages that describe it are marked **Building**.

You start with feature one today.

<CardGroup cols={2}>
  <Card title="Quickstart" icon="bolt" href="/quickstart">
    Go from a CSV to an optimized prompt in under ten lines.
  </Card>

  <Card title="JSON extraction" icon="brackets-curly" href="/how-to/json-extraction">
    A full run for the first task class, end to end.
  </Card>

  <Card title="Capture from LangChain" icon="plug" href="/how-to/capture-langchain">
    Log your production calls with one callback, no code changes.
  </Card>

  <Card title="Python SDK reference" icon="book" href="/reference/python-sdk">
    Every method, its parameters, and what it returns.
  </Card>
</CardGroup>

## What you can prove today

The prompt-optimization layer is real and reproducible. On a public JSON extraction set (100 examples, 70 train, 30 held out), GPT-4o-mini went from 83.1 to 85.6 with GEPA, and a fine-tuned Qwen3.5-4B went from 69.1 to 88.9. You can run it yourself: [apprentice-benchmark](https://github.com/singh-abhishekk/apprentice-benchmark).

## How we write these docs

Every number, tier, and behavior on this site matches the code. If a feature is not shipped, the page says so. A run that does not improve is reported as a real result, not hidden. If you find a claim that drifts from what the SDK does, it is a bug, tell us.
