Moved into a new place. Chronic procrastination set in. For a techie, there's really only one acceptable form of procrastination β build a smart home from scratch.
Two months later: a self-hosted stack on a Beelink mini-PC, a local AI agent named Nova, a 3D Three.js dashboard, an iPhone-as-house-remote, and a small physical robot being assembled to give Nova a body.
Local Β· regex-fast Β· increasingly opinionated
The local AI agent that ties the apartment together. FastAPI + Ollama (qwen3:8b), running entirely on the Beelink. No cloud LLM in the hot path.
Telegram iPhone App Samsung TV / Mac
β β β
βΌ βΌ βΌ
ββββββββββ ββββββββββ ββββββββββββ
β bot β β voice β β http β
βpolling β β ws β β /chat β
ββββββ¬ββββ ββββββ¬ββββ ββββββ¬ββββββ
ββββββββββββββββββββΌβββββββββββββββββββ
βΌ
ββββββββββββββββββββββ
β NovaDispatcher β β single entry point
βββββββββββ¬βββββββββββ
βΌ
βββββββββββββ΄ββββββββββββ
βΌ βΌ
βββββββββββββββ ββββββββββββββββ
β regex β no β ollama β
β pattern β βββββββΊ β qwen3:8b β
β matcher β match β (local LLM) β
ββββββββ¬βββββββ ββββββββ¬ββββββββ
β match β classify
βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββ
β 18 intent handlers β
β HOME_CONTROL Β· TV Β· ROUTINE Β· ... β
ββββββββββββββββββββββββββββββββββββββββMost messages hit a regex pattern in under a millisecond β zero LLM tokens. Only the ambiguous ones fall through to qwen3:8b for classification. Keeps response times under 1ms for ~80% of commands.
Telegram, voice WebSocket, and HTTP all funnel into one NovaDispatcher.dispatch(). Channels handle their own UI quirks (keyboards, TTS), routing logic is shared.
Self-trained OpenWakeWord model recognizes βHey Novaβ. Voice replies via Piper TTS using a cloned Jarvis voice from Iron Man. (Iron Man tax: paid.)
Different transports, same brain. Channel-specific UI (inline keyboards, TTS audio) handled at the edge β routing logic shared.
bot pollingText messages, /commands, receipt photos (β Gemini OCR β Notion)
/ws/nova/voice16kHz PCM audio from the iPhone app. Self-trained wake word β Whisper STT β dispatch β Piper (Jarvis voice) TTS
POST /api/nova/chatPlain JSON for the dashboard and any web client
NovaDispatcher.dispatch(text, chat_id, channel)18 intents across 8 domains. Plus pronoun resolution ('turn it off'), confirmation state machine ('Turn off all 6 lights? [Yes][No]'), and a tool-calling PlannerAgent for compound commands.
HOME_CONTROLLights, switches, brightness, fans
HOME_VACUUMRoborock β start, pause, dock, room
HOME_STATUS"Is the kitchen light on?"
PRESENCE_QUERYWho's home β by MAC
TV_CONTROLPower, apps, volume, d-pad
MEDIA_PLAY"play X on spotify/youtube"
MEDIA_PLAYBACKPause, skip, what's playing
WEATHERWeather, forecast, temperature
TIMER_ALARMNatural-language timers
CALENDARSchedule, events
MEMORYReminders, shopping list, notes
ROUTINEGood morning, bedtime, leaving
SCENEMovie mode, party, date night
COMPOUNDPlannerAgent β tool-calling
WIFI_QRShow WiFi QR on TV
REMOTE_QRShow TV remote link QR
CONVERSATIONOpen-ended chat (qwen3 /think)
INFO_QUERYFactual questions (/nothink)
Visual programming for the home β drag-and-drop blocks with trigger / condition / action types, branching flowcharts, and live execution tracking. Build a routine without writing code.
scene Β· movie_modescene Β· wine_modescene Β· welcome_homeScreens Β· lights Β· and, soon, limbs
What you actually see and touch. The iframes below are real demo builds of the apps β click around, toggle lights, trigger scenes, chat with Nova. Mock state lives in your browser and syncs across iframes.
Three.js + React + Vite Β· entry: mac.htmlA 3D scene of the actual apartment β rooms, lights, the TV. Click a lamp, the real lamp turns on. First version was for the Mac browser.
Then I wanted it on the Samsung TV. Turns out Samsung doesn't exactly want you uploading custom apps to their TVs β but if you find the right magic incantation in dev mode, you can sideload one anyway. I did. Same Vite build, three entry points: /app (mobile), mac.html, tv.html.
React + Capacitor Β· tabbed bottom navThree tabs: Lights (5 rooms with toggles + brightness sliders + β2 of 3 onβ counter), Remote (Spotify / YouTube / TV β try the rickroll), Nova (real chat with chip shortcuts).
About Netflix and Prime: their integration is, quote, painfully thin. You can open the app. That is the entire API surface they expose.
Pi-hole DNS Β· FRITZ!Box TR-064 Β· InfluxDB Β· GrafanaβPeople-firstβ presence dashboard. Detects who's home by MAC (FRITZ!Box TR-064 + Pi-hole DNS queries), tracks usage in InfluxDB, visualizes in Grafana. Drives the morning briefing and Welcome-Home routine.


The current procrastination project β moving Nova out of a kitchen phone and into a real robot.
First version of always-listening Nova ran on a 10-year-old phone in the kitchen β voice app on a permanent charger, microphone open. It worked. It was also, undeniably, a phone on a charger.
The replacement: a small physical robot with an animated face, body effects, and a microphone. The web preview above is the actual implementation β same p5.js eye styles (roboeyes / cozmo / kawaii / statusring), same canvas body renderers (waveform / lavalamp / horror / matrix / gauges / status), same scene system. Switch the buttons.
The hardest part: GIF streaming over MQTT β RGB565 little-endian frame decoding took longer to debug than I'd like to admit. On the actual ESP32 hardware, GIFs play directly on the body screen via the AnimatedGIF library off SPIFFS. Here's what the GREETING intent renders:

esp-head/Animated eye display β blink, look, expressions, GIF playback
esp-body/Body controller β movement, scene orchestration
esp-body-effects/Visual effects library β lava lamp, horror, wave standby (15+ board configs)
Details Β· surprises Β· process Β· dead ends
Side quests, surprises, and 'wait, it does that too?' moments.
Mature projects have graveyards. Three paths sketched, started, or experimented with β then cut. Knowing what to remove matters as much as knowing what to ship.
Layered, mostly local, mostly open-source.
Every meaningful feature gets a written design plan before any code. 21 documents, ordered by build date. Peak velocity: 35 commits on Apr 8, 30 on Apr 2, 29 on Apr 9.
Personal project Β· Andrei Zitti Β· started Feb 2026, still going