> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runapprentice.com/llms.txt
> Use this file to discover all available pages before exploring further.

# JSON extraction, end to end

> Build a golden dataset for a JSON extraction task, optimize the prompt, and pull the optimized prompt back into your code.

JSON extraction is the first task class Apprentice supports. This guide takes you from rows to an optimized prompt you can ship.

## 1. Prepare the rows

Each row is one input and the exact JSON you expect back. A CSV with an `input` and an `output` column is enough:

```csv theme={null}
input,output
"Invoice 4471, ACME Corp, due 2026-02-01, total $1,240","{""invoice_no"":""4471"",""vendor"":""ACME Corp"",""due"":""2026-02-01"",""total"":1240}"
"Receipt from Globex, 2026-01-14, $89.50","{""vendor"":""Globex"",""date"":""2026-01-14"",""total"":89.5}"
```

If your inputs have several fields rather than one string, upload structured rows instead of a CSV:

```python theme={null}
client.datasets.upload(
    "invoice-json",
    rows=[
        {"inputs": {"text": raw_text}, "output": gold_json}
        for raw_text, gold_json in pairs
    ],
)
```

Uploaded rows count as **silver**: data you curated. Optimization uses verified rows, which is gold plus silver. Eval gates and model promotion use gold only. See [data and metrics](/reference/python-sdk#datasets) for the full tier rules.

## 2. Create the task

```python theme={null}
import os
from apprentice import Apprentice

client = Apprentice(
    api_key=os.environ["APPRENTICE_API_KEY"],
    base_url=os.environ["APPRENTICE_BASE_URL"],
)

client.tasks.create("invoice-json", metric="json_f1")
```

`json_f1` is a deterministic metric: it parses both sides as JSON and scores field-level overlap. No judge model is involved, so the score is exact and cheap.

## 3. Upload with a baseline prompt

```python theme={null}
status = client.datasets.upload(
    "invoice-json",
    path="golden.csv",
    input_col="input",
    output_col="output",
    prompt="Extract the invoice fields as a JSON object.",
)
print(status.gold, status.silver, status.ready_for_optimization)
```

The `prompt` is the instruction you want to beat. The optimizer measures its gain against this baseline.

## 4. Optimize

```python theme={null}
report = client.optimize("invoice-json").wait().report()
print(report.baseline_score, "->", report.optimized_score)
```

`optimize` returns a `Job`. `wait` polls until the run finishes, then `report` returns the scores and the rewritten prompt. The run is gated by a minimum number of verified rows, so upload enough before you call it.

## 5. Ship the optimized prompt

```python theme={null}
best = client.prompts.get("invoice-json")
print(best.version, best.text)
```

`prompts.get` returns the latest versioned prompt. Paste `best.text` into your application, or keep pulling it by version as you iterate.

<Note>
  If the optimized score does not improve, the prompt was already close to the ceiling for this data, or the metric does not match the task. Add cleaner rows or change the metric. The tool will not invent a gain.
</Note>

## Reproduce a real result

The same flow on a public JSON set is open source: [apprentice-benchmark](https://github.com/singh-abhishekk/apprentice-benchmark). It reports GPT-4o-mini 83.1 to 85.6 with GEPA on 30 held-out examples, and a fine-tuned Qwen3.5-4B at 88.9.
