JSON extraction, end to end

JSON extraction is the first task class Apprentice supports. This guide takes you from rows to an optimized prompt you can ship.

1. Prepare the rows

Each row is one input and the exact JSON you expect back. A CSV with an input and an output column is enough:

input,output
"Invoice 4471, ACME Corp, due 2026-02-01, total $1,240","{""invoice_no"":""4471"",""vendor"":""ACME Corp"",""due"":""2026-02-01"",""total"":1240}"
"Receipt from Globex, 2026-01-14, $89.50","{""vendor"":""Globex"",""date"":""2026-01-14"",""total"":89.5}"

If your inputs have several fields rather than one string, upload structured rows instead of a CSV:

client.datasets.upload(
    "invoice-json",
    rows=[
        {"inputs": {"text": raw_text}, "output": gold_json}
        for raw_text, gold_json in pairs
    ],
)

Uploaded rows count as silver: data you curated. Optimization uses verified rows, which is gold plus silver. Eval gates and model promotion use gold only. See data and metrics for the full tier rules.

2. Create the task

import os
from apprentice import Apprentice

client = Apprentice(
    api_key=os.environ["APPRENTICE_API_KEY"],
    base_url=os.environ["APPRENTICE_BASE_URL"],
)

client.tasks.create("invoice-json", metric="json_f1")

json_f1 is a deterministic metric: it parses both sides as JSON and scores field-level overlap. No judge model is involved, so the score is exact and cheap.

3. Upload with a baseline prompt

status = client.datasets.upload(
    "invoice-json",
    path="golden.csv",
    input_col="input",
    output_col="output",
    prompt="Extract the invoice fields as a JSON object.",
)
print(status.gold, status.silver, status.ready_for_optimization)

The prompt is the instruction you want to beat. The optimizer measures its gain against this baseline.

4. Optimize

report = client.optimize("invoice-json").wait().report()
print(report.baseline_score, "->", report.optimized_score)

optimize returns a Job. wait polls until the run finishes, then report returns the scores and the rewritten prompt. The run is gated by a minimum number of verified rows, so upload enough before you call it.

5. Ship the optimized prompt

best = client.prompts.get("invoice-json")
print(best.version, best.text)

prompts.get returns the latest versioned prompt. Paste best.text into your application, or keep pulling it by version as you iterate.

If the optimized score does not improve, the prompt was already close to the ceiling for this data, or the metric does not match the task. Add cleaner rows or change the metric. The tool will not invent a gain.

Reproduce a real result

The same flow on a public JSON set is open source: apprentice-benchmark. It reports GPT-4o-mini 83.1 to 85.6 with GEPA on 30 held-out examples, and a fine-tuned Qwen3.5-4B at 88.9.

​1. Prepare the rows

​2. Create the task

​3. Upload with a baseline prompt

​4. Optimize

​5. Ship the optimized prompt

​Reproduce a real result

1. Prepare the rows

2. Create the task

3. Upload with a baseline prompt

4. Optimize

5. Ship the optimized prompt

Reproduce a real result