hms-assist-api — a 100% local voice assistant
A self-hosted voice-command API that turns natural language into Home Assistant actions — entirely on your own hardware, no cloud. A 3-tier intent pipeline answers most commands in milliseconds with regex, falls back to pgvector semantic search over your real entities, and only reaches for a local LLM when a request is genuinely open-ended.
What it does
Tier 1 — instant & deterministic
16 regex patterns cover the everyday stuff — lights, locks, thermostats, media, scenes — with an exact entity match against the HA REST API. Roughly 70% of commands resolve here in under 5 ms.
Tier 2 — semantic search
"Can you brighten up the sala" still finds light.sala_1. Around 1,115 entities are embedded with nomic-embed-text and matched via pgvector cosine search (~300 ms). Read-only sensor queries return live HA state instantly — no LLM needed.
Tier 3 — local LLM fallback
Only the genuinely ambiguous requests ("make the living room cozy", "I'm heading to bed") escalate to a fast local model, with a smarter cloud-class model as a last resort. Every step stays under your control.
Get it running
Docker quick start
Pull the prebuilt image, drop in a config file, and you're serving commands on port 8894.
docker pull ghcr.io/hms-homelab/hms-assist-api:latest
docker run -d \
-p 8894:8894 \
-v /etc/hms-assist/config.yaml:/etc/hms-assist/config.yaml:ro \
ghcr.io/hms-homelab/hms-assist-api:latestSend a command
POST natural-language text and get a structured result back — which tier handled it, the matched entity, and the spoken response.
curl -X POST http://localhost:8894/api/v1/command \
-H "Content-Type: application/json" \
-d '{"text": "turn on the patio light", "device_id": "test"}'Home Assistant, Ollama & pgvector
Point one config.yaml at your stack and you're done. It needs a Home Assistant URL + long-lived token, a PostgreSQL 17 database (hms_assist) with the pgvector extension, and an Ollama server with nomic-embed-text and a local chat model pulled. Optional Wyoming Piper/Whisper hosts enable spoken responses.
homeassistant:
url: http://{ha-host}:8123
token: {long-lived token}
ollama:
url: http://{ollama-host}:11434
embed_model: nomic-embed-text
fast_model: llama3.2:3b
service:
port: 8894
vector_similarity_threshold: 0.58Include media_player_entity_id in a command to have the response spoken via a Home Assistant media player. A background sync service re-indexes your HA entities every 60 minutes (or on demand via POST /admin/reindex).
Run your own voice assistant
The C++ API, Python entity-sync tool, PostgreSQL schema and 181 tests are all open source under the MIT license.