JSON extraction is the first task class Apprentice supports. This guide takes you from rows to an optimized prompt you can ship.
1. Prepare the rows
Each row is one input and the exact JSON you expect back. A CSV with an input and an output column is enough:
input,output
"Invoice 4471, ACME Corp, due 2026-02-01, total $1,240","{""invoice_no"":""4471"",""vendor"":""ACME Corp"",""due"":""2026-02-01"",""total"":1240}"
"Receipt from Globex, 2026-01-14, $89.50","{""vendor"":""Globex"",""date"":""2026-01-14"",""total"":89.5}"
If your inputs have several fields rather than one string, upload structured rows instead of a CSV:
client.datasets.upload(
"invoice-json",
rows=[
{"inputs": {"text": raw_text}, "output": gold_json}
for raw_text, gold_json in pairs
],
)
Uploaded rows count as silver: data you curated. Optimization uses verified rows, which is gold plus silver. Eval gates and model promotion use gold only. See data and metrics for the full tier rules.
2. Create the task
import os
from apprentice import Apprentice
client = Apprentice(
api_key=os.environ["APPRENTICE_API_KEY"],
base_url=os.environ["APPRENTICE_BASE_URL"],
)
client.tasks.create("invoice-json", metric="json_f1")
json_f1 is a deterministic metric: it parses both sides as JSON and scores field-level overlap. No judge model is involved, so the score is exact and cheap.
3. Upload with a baseline prompt
status = client.datasets.upload(
"invoice-json",
path="golden.csv",
input_col="input",
output_col="output",
prompt="Extract the invoice fields as a JSON object.",
)
print(status.gold, status.silver, status.ready_for_optimization)
The prompt is the instruction you want to beat. The optimizer measures its gain against this baseline.
4. Optimize
report = client.optimize("invoice-json").wait().report()
print(report.baseline_score, "->", report.optimized_score)
optimize returns a Job. wait polls until the run finishes, then report returns the scores and the rewritten prompt. The run is gated by a minimum number of verified rows, so upload enough before you call it.
5. Ship the optimized prompt
best = client.prompts.get("invoice-json")
print(best.version, best.text)
prompts.get returns the latest versioned prompt. Paste best.text into your application, or keep pulling it by version as you iterate.
If the optimized score does not improve, the prompt was already close to the ceiling for this data, or the metric does not match the task. Add cleaner rows or change the metric. The tool will not invent a gain.
Reproduce a real result
The same flow on a public JSON set is open source: apprentice-benchmark. It reports GPT-4o-mini 83.1 to 85.6 with GEPA on 30 held-out examples, and a fine-tuned Qwen3.5-4B at 88.9.