Adapt

Cost & Performance

LLM call counts per operation, model-slot routing, and tuning for production workloads.

Adapt's cost is dominated by LLM calls. This page gives you the exact call count per operation so you can estimate spend and latency, and points to the levers that move the most.

Quick Reference

Typical cost for common workflows:

ScenarioNeuronsCalls per injectCalls per ask
Simple (3 neurons, no understand trigger)335
Medium (5 neurons, understand triggers on 1)567
Large (10 neurons, understand triggers on 3)101312
Deep mode ask (5 neurons)52–12

Cost tip: The observe phase runs on every inject and scales linearly with neurons. This is where a fast/cheap model pays off the most. Understand and query run less frequently but need higher quality — use a smarter model there. See Configuration — Model Cascade for the cost-optimized cascade setup.

Detailed Call Counts

N = number of neurons, B = number of batches.

Core Operations

OperationLLM CallsModel SlotNotes
inject(data)
→ ObserveN × Blearning.observer1 call per neuron per batch. Skipped if skipObservation: true
→ Understand0 – Nlearning.understandOnly triggers when buffer exceeds maxObservations or maxTokens. Always skipped if skipUnderstand: true
ask(question)
→ Neuron selection1querySkipped if only 1 neuron has knowledge
→ Neuron queriesNlearning.queryParallel. N = relevant neurons, not all
→ Synthesis1queryCombines neuron results into final answer
ask(question, { mode: 'deep' })2–12queryAgentic — LLM decides which neurons to query and when to stop
query(question)1learning.queryStandalone neuron query (single call)

Lifecycle Operations

OperationLLM CallsModel SlotNotes
Brain.create()
→ Decomposition1initOnly with autoSetup: true. Determines neuron structure
→ Prompt parsing1blueprintModelExtracts purpose and synthesis directives
→ Neuron init0 – NblueprintModelUp to 1 call per neuron. TextNeuron: cognitive-skill customization — always, unless skipUnderstand (then 0). ListNeuron: schema generation — 0 when a custom observationSchema is supplied. Observe and understand prompts are assembled deterministically — no LLM call.
adjust(directive)1–3blueprintModel1 classify + (if config changes) 1 instruction-rewrite + 1 prompt/schema regen. Observe and understand prompts are re-assembled deterministically — TextNeuron spends the extra call on skill regen, ListNeuron on schema regen
→ + Understanding rewrite+1–15learning.understandOnly if directive changes what the neuron knows, not just how it behaves
update(config)0–1blueprintModel0 if mechanical (model/threshold changes). 1 if instructions changed (schema/skill regen — the observe/understand prompts themselves regenerate without an LLM call)

Evolution Operations

OperationLLM CallsModel SlotNotes
signal()0Buffers only. No immediate LLM call
Evaluator trigger1–12evolutionAgentic — inspects neurons, reviews gaps, makes decisions
→ Create N neurons1 + NblueprintModel1 generation + 1 per neuron init
→ Merge neurons1blueprintModelSingle generation call
→ Split into N1 + (N-1)blueprintModel1 generation + 1 init per new neuron
→ Update neuron1–15blueprintModel1 guidance + optional adjust cascade
→ Delete neuron0Mechanical removal

Where to spend the smarter model

Routing the right model to the right slot is the cheapest performance win. Some heuristics:

  • learning.observer — runs on every batch, every neuron. Use the cheapest model that can follow the observation schema. A relevance-classification call doesn't need a frontier model.
  • learning.understand — runs less often but is what determines knowledge quality. Use a smart model.
  • learning.query — runs once per ask per neuron. Smart model recommended; the answer the user sees is shaped here.
  • query (brain ask synthesis) — final integration step. Smart model.
  • init / blueprintModel — one-time and rare (decomposition, per-neuron skill/schema generation). Smart model. Cost is amortized.
  • evolution — runs when signals accumulate. Smart model with tool-calling support is required.

See Configuration — Model Cascade for the full cascade and the Cost-optimized setup example.

Latency

Per-call latency is dominated by the model and provider, not Adapt. A few observations:

  • Observe phase runs in parallel across neurons within a batch.
  • Synthesize runs once per neuron when its buffer crosses a threshold — not every batch.
  • Internal neurons run a second pass after user-facing neurons. They use queries that resolve asynchronously and often complete on the next inject.
  • Deep-mode ask is variable: 2–12 calls depending on the agent's path.

For a step-by-step lifecycle with timing estimates, see Events — Brain lifecycle timing.

On this page