Note generation is very slow

Note generation runs on your CPU/GPU and shares your machine's RAM with everything else. If it's taking 5+ minutes for what used to take 1, here's what to check.

Quick wins (try first)

Close other heavy apps. Anything using significant RAM or CPU competes with the note model. Common culprits: video calls (Zoom, Teams), other AI apps (ChatGPT, Claude desktop), video editors, multiple Chrome windows with many tabs.
Restart Confidant. If the model has been running for a long session and your RAM is fragmented, restarting Confidant gives you a clean slate.
Check available RAM. Open Activity Monitor (Mac) or Task Manager (Windows) → Memory. If you're at 90%+ memory pressure, the model is probably swapping to disk, which is dramatically slower.

Expected timings

Generation time depends on transcript length and your machine.

Transcript length	Typical Pro-tier time	Two-pass kicks in?
<30 min session	30–90 sec	No
30–60 min session	1–3 min	No
60–90 min session	2–4 min	No
90+ min session	3–6 min	Yes (extraction + synthesis)

Two-pass generation (for very long sessions) automatically chunks the transcript so it fits in the model's context window. You'll see an "extracting" progress indicator before the note streams in.

Slower than expected

If you're regularly hitting times much longer than the table above:

Check your tier. If you're on a 16 GB machine, you may be on the Lite-tier model, which is faster but less capable. The Pro tier (default on 24+ GB machines) is what the table reflects.
Try the same session again. First generation after launch is slower because the model has to load into memory. Subsequent generations are faster.
Confirm hardware acceleration. On Apple Silicon Macs, Confidant uses Metal for inference. If something has gone wrong with that, generation falls back to CPU — much slower. A restart usually fixes it.

Stuck mid-generation

If a generation stops streaming tokens mid-sentence and just hangs:

Wait at least 90 seconds. Confidant's inactivity timeout is 90 seconds — if no chunk arrives in that window, it'll abort with a clear error message.
If it just stops without an error, the transcript may be too long for the model's context window. Confidant routes very long transcripts (15K+ words) through the two-pass pipeline automatically, but if you're seeing this on a shorter transcript, contact support.

When it's the model's fault, not yours

A handful of generation failures are model-side:

The model loops on its own reasoning (rare since we cap reasoning tokens)
The model produces invalid JSON that can't be parsed (very rare)

Both manifest as a regenerated note that comes back blank or partial. The fix is the same: click Regenerate again. Generations are deterministic on the same input but the model picks different paths each time, so the second attempt usually works.

If three attempts fail in a row on the same session, it's worth reporting — that's a real bug, not a flake.