hms-assist-api — a 100% local voice assistant

A self-hosted voice-command API that turns natural language into Home Assistant actions — entirely on your own hardware, no cloud. A 3-tier intent pipeline answers most commands in milliseconds with regex, falls back to pgvector semantic search over your real entities, and only reaches for a local LLM when a request is genuinely open-ended.

View on GitHub →All projects

Home AssistantpgvectorOllamaC++ / DrogonOpen source

What it does

Tier 1 — instant & deterministic

16 regex patterns cover the everyday stuff — lights, locks, thermostats, media, scenes — with an exact entity match against the HA REST API. Roughly 70% of commands resolve here in under 5 ms.

Tier 2 — semantic search

"Can you brighten up the sala" still finds light.sala_1. Around 1,115 entities are embedded with nomic-embed-text and matched via pgvector cosine search (~300 ms). Read-only sensor queries return live HA state instantly — no LLM needed.

Tier 3 — local LLM fallback

Only the genuinely ambiguous requests ("make the living room cozy", "I'm heading to bed") escalate to a fast local model, with a smarter cloud-class model as a last resort. Every step stays under your control.

Get it running

Docker quick start

Pull the prebuilt image, drop in a config file, and you're serving commands on port 8894.

docker pull ghcr.io/hms-homelab/hms-assist-api:latest

docker run -d \
  -p 8894:8894 \
  -v /etc/hms-assist/config.yaml:/etc/hms-assist/config.yaml:ro \
  ghcr.io/hms-homelab/hms-assist-api:latest

Send a command

POST natural-language text and get a structured result back — which tier handled it, the matched entity, and the spoken response.

curl -X POST http://localhost:8894/api/v1/command \
  -H "Content-Type: application/json" \
  -d '{"text": "turn on the patio light", "device_id": "test"}'

Home Assistant, Ollama & pgvector

Point one config.yaml at your stack and you're done. It needs a Home Assistant URL + long-lived token, a PostgreSQL 17 database (hms_assist) with the pgvector extension, and an Ollama server with nomic-embed-text and a local chat model pulled. Optional Wyoming Piper/Whisper hosts enable spoken responses.

homeassistant:
  url: http://{ha-host}:8123
  token: {long-lived token}
ollama:
  url: http://{ollama-host}:11434
  embed_model: nomic-embed-text
  fast_model: llama3.2:3b
service:
  port: 8894
  vector_similarity_threshold: 0.58

Include media_player_entity_id in a command to have the response spoken via a Home Assistant media player. A background sync service re-indexes your HA entities every 60 minutes (or on demand via POST /admin/reindex).

Run your own voice assistant

The C++ API, Python entity-sync tool, PostgreSQL schema and 181 tests are all open source under the MIT license.

View on GitHub →