Cost & Performance
LLM call counts per operation, model-slot routing, and tuning for production workloads.
Adapt's cost is dominated by LLM calls. This page gives you the exact call count per operation so you can estimate spend and latency, and points to the levers that move the most.
Quick Reference
Typical cost for common workflows:
| Scenario | Neurons | Calls per inject | Calls per ask |
|---|---|---|---|
| Simple (3 neurons, no understand trigger) | 3 | 3 | 5 |
| Medium (5 neurons, understand triggers on 1) | 5 | 6 | 7 |
| Large (10 neurons, understand triggers on 3) | 10 | 13 | 12 |
| Deep mode ask (5 neurons) | 5 | — | 2–12 |
Cost tip: The observe phase runs on every inject and scales linearly with neurons. This is where a fast/cheap model pays off the most. Understand and query run less frequently but need higher quality — use a smarter model there. See Configuration — Model Cascade for the cost-optimized cascade setup.
Detailed Call Counts
N = number of neurons, B = number of batches.
Core Operations
| Operation | LLM Calls | Model Slot | Notes |
|---|---|---|---|
inject(data) | |||
| → Observe | N × B | learning.observer | 1 call per neuron per batch. Skipped if skipObservation: true |
| → Understand | 0 – N | learning.understand | Only triggers when buffer exceeds maxObservations or maxTokens. Always skipped if skipUnderstand: true |
ask(question) | |||
| → Neuron selection | 1 | query | Skipped if only 1 neuron has knowledge |
| → Neuron queries | N | learning.query | Parallel. N = relevant neurons, not all |
| → Synthesis | 1 | query | Combines neuron results into final answer |
ask(question, { mode: 'deep' }) | 2–12 | query | Agentic — LLM decides which neurons to query and when to stop |
query(question) | 1 | learning.query | Standalone neuron query (single call) |
Lifecycle Operations
| Operation | LLM Calls | Model Slot | Notes |
|---|---|---|---|
Brain.create() | |||
| → Decomposition | 1 | init | Only with autoSetup: true. Determines neuron structure |
| → Prompt parsing | 1 | blueprintModel | Extracts purpose and synthesis directives |
| → Neuron init | 0 – N | blueprintModel | Up to 1 call per neuron. TextNeuron: cognitive-skill customization — always, unless skipUnderstand (then 0). ListNeuron: schema generation — 0 when a custom observationSchema is supplied. Observe and understand prompts are assembled deterministically — no LLM call. |
adjust(directive) | 1–3 | blueprintModel | 1 classify + (if config changes) 1 instruction-rewrite + 1 prompt/schema regen. Observe and understand prompts are re-assembled deterministically — TextNeuron spends the extra call on skill regen, ListNeuron on schema regen |
| → + Understanding rewrite | +1–15 | learning.understand | Only if directive changes what the neuron knows, not just how it behaves |
update(config) | 0–1 | blueprintModel | 0 if mechanical (model/threshold changes). 1 if instructions changed (schema/skill regen — the observe/understand prompts themselves regenerate without an LLM call) |
Evolution Operations
| Operation | LLM Calls | Model Slot | Notes |
|---|---|---|---|
signal() | 0 | — | Buffers only. No immediate LLM call |
| Evaluator trigger | 1–12 | evolution | Agentic — inspects neurons, reviews gaps, makes decisions |
| → Create N neurons | 1 + N | blueprintModel | 1 generation + 1 per neuron init |
| → Merge neurons | 1 | blueprintModel | Single generation call |
| → Split into N | 1 + (N-1) | blueprintModel | 1 generation + 1 init per new neuron |
| → Update neuron | 1–15 | blueprintModel | 1 guidance + optional adjust cascade |
| → Delete neuron | 0 | — | Mechanical removal |
Where to spend the smarter model
Routing the right model to the right slot is the cheapest performance win. Some heuristics:
learning.observer— runs on every batch, every neuron. Use the cheapest model that can follow the observation schema. A relevance-classification call doesn't need a frontier model.learning.understand— runs less often but is what determines knowledge quality. Use a smart model.learning.query— runs once per ask per neuron. Smart model recommended; the answer the user sees is shaped here.query(brain ask synthesis) — final integration step. Smart model.init/blueprintModel— one-time and rare (decomposition, per-neuron skill/schema generation). Smart model. Cost is amortized.evolution— runs when signals accumulate. Smart model with tool-calling support is required.
See Configuration — Model Cascade for the full cascade and the Cost-optimized setup example.
Latency
Per-call latency is dominated by the model and provider, not Adapt. A few observations:
- Observe phase runs in parallel across neurons within a batch.
- Synthesize runs once per neuron when its buffer crosses a threshold — not every batch.
- Internal neurons run a second pass after user-facing neurons. They use queries that resolve asynchronously and often complete on the next inject.
- Deep-mode ask is variable: 2–12 calls depending on the agent's path.
For a step-by-step lifecycle with timing estimates, see Events — Brain lifecycle timing.