# Apprentice

> Apprentice helps developers cut LLM cost without losing quality. You optimize a prompt against a verified dataset today, then later train a small model to replace the frontier model, switched only when evals prove no quality regression. Public package: apprentice-sdk, import name: apprentice.

## Status
- The prompt-optimization SDK is current for v0.1.
- Small-model training, canary rollout, dashboard controls, and hosted rollout APIs are being built. Do not describe them as shipped unless a page says so.

## Start here
- Quickstart: /quickstart
- JSON extraction, end to end: /how-to/json-extraction
- Capture from a LangChain app: /how-to/capture-langchain
- Python SDK reference: /reference/python-sdk

## Rules for coding agents
- Ingest data with `client.datasets.upload(...)`. There is no `ingest()` method.
- Use structured `inputs={...}` for multi-field and templated tasks. Do not pass a single rendered prompt string as the only input for a templated task.
- For RAG, pass `inputs={"question": question, "context": exact_context}`. The context must be exactly what the model saw.
- `client.capture(...)` is fail-open and returns `None` instead of raising. Do not wrap it in try/except to protect the app.
- `ApprenticeCallback` captures LangChain calls and simple retriever context. Use manual `client.capture(...)` when there are multiple retrievers or custom context formatting.
- Optimization uses verified rows: gold plus silver. Eval gates and model promotion use gold only.
- Judge-scored metrics (semantic_f1, rag_faithfulness, rag_composite) are advisory estimates, not deterministic truth.
- Never claim a quality gain that the report does not show. A run that does not improve is a real result.

## Proof
- Benchmark, reproducible: https://github.com/singh-abhishekk/apprentice-benchmark
- Fine-tuned adapter: https://huggingface.co/singhabhishekkk/apprentice-qwen35-4b-lora-jsonextract