Model won't load (low RAM error)
If Confidant won't generate notes and you're seeing errors about the model failing to load, almost always the cause is RAM.
Confidant requires 16 GB of RAM
This is non-negotiable. The note model has to fit in memory at the same time as your operating system, your browser, your video conferencing app, etc. We tested every smaller configuration extensively and on 8 GB machines the model either won't load or generates clinically dangerous notes.
If you have an 8 GB machine, please use a different machine for clinical work. We won't ship a worse note for the sake of supporting under-spec hardware.
If you have 16+ GB and the model still won't load
- Check actual free RAM. Open Activity Monitor (Mac) or Task Manager (Windows). Look at memory pressure — if you're already at 80%+ before opening Confidant, the model has nowhere to live.
- Close other heavy apps. Most common offenders:
- Other AI desktop apps (ChatGPT, Claude, Cursor)
- Video editors (Final Cut, Premiere, DaVinci Resolve)
- Multiple Chrome windows with many tabs
- VMs or Docker containers
- Other transcription / dictation apps
- Restart Confidant. Sometimes the inference server gets into a weird state and a fresh launch resolves it.
- Restart your computer. If you've been up for a week, RAM may be fragmented enough that even a 16 GB machine struggles to allocate a contiguous block.
"Lite tier" vs "Pro tier"
Confidant ships two model sizes:
- Pro tier — the default for 24+ GB machines. Larger model, better notes, ~6 GB RAM during generation.
- Lite tier — used on 16 GB machines where Pro won't fit reliably. Smaller model (Gemma 4 E2B), ~3 GB RAM during generation. Faster but less capable.
Confidant picks the tier automatically based on your machine's total RAM. You can override this in Settings → Notes → AI Model if needed (e.g. you want to force Lite to keep more RAM free for other apps).
"Failed to start llama-server"
This error means the inference server (the program that runs the model) couldn't launch. Causes:
- Antivirus is blocking it. Whitelist Confidant in your antivirus settings. The bundled
llama-serverbinary is what triggers most antivirus heuristics. - Port 8080 is in use. Confidant talks to the model over local port 8080. Quit any other app using that port and restart Confidant.
- The binary failed to download. During onboarding, Confidant downloads
llama-serverseparately from the model. If the download was interrupted, retry by quitting and relaunching the app.
"Unknown model architecture"
This means the model file and the inference server are out of sync. It happens when one updates without the other. The fix is to clear both and let Confidant re-download:
- Quit Confidant
- Delete
~/Library/Application Support/com.confidant.notes/llama/llama-server(Mac) or the equivalent path on Windows - Delete the
.ggufmodel file in the same directory - Relaunch — Confidant will re-download both, paired versions
If this keeps happening on subsequent launches, contact support.