# Apprentice > Apprentice helps developers cut LLM cost without losing quality. You optimize a prompt against a verified dataset today, then later train a small model to replace the frontier model, switched only when evals prove no quality regression. Public package: apprentice-sdk, import name: apprentice. ## Status - The prompt-optimization SDK is current for v0.1. - Small-model training, canary rollout, dashboard controls, and hosted rollout APIs are being built. Do not describe them as shipped unless a page says so. ## Start here - Quickstart: /quickstart - JSON extraction, end to end: /how-to/json-extraction - Capture from a LangChain app: /how-to/capture-langchain - Python SDK reference: /reference/python-sdk ## Rules for coding agents - Ingest data with `client.datasets.upload(...)`. There is no `ingest()` method. - Use structured `inputs={...}` for multi-field and templated tasks. Do not pass a single rendered prompt string as the only input for a templated task. - For RAG, pass `inputs={"question": question, "context": exact_context}`. The context must be exactly what the model saw. - `client.capture(...)` is fail-open and returns `None` instead of raising. Do not wrap it in try/except to protect the app. - `ApprenticeCallback` captures LangChain calls and simple retriever context. Use manual `client.capture(...)` when there are multiple retrievers or custom context formatting. - Optimization uses verified rows: gold plus silver. Eval gates and model promotion use gold only. - Judge-scored metrics (semantic_f1, rag_faithfulness, rag_composite) are advisory estimates, not deterministic truth. - Never claim a quality gain that the report does not show. A run that does not improve is a real result. ## Proof - Benchmark, reproducible: https://github.com/singh-abhishekk/apprentice-benchmark - Fine-tuned adapter: https://huggingface.co/singhabhishekkk/apprentice-qwen35-4b-lora-jsonextract