Tech & Dev 75% CONFIDENCE Dev.to Top 14 czerwca 2026 22:09

Build a Private Voice Assistant with Whisper, Ollama, and Kokoro TTS

AUTHOR · EveryLocalAI

Have you ever wanted your own Jarvis? A voice assistant that listens, thinks, and speaks back - all running privately on your own hardware? Here's how to build one with Whisper.cpp, Ollama, and Kokoro TTS. No cloud, no wake-word fees, no data leaving your machine. Prerequisites Hardware: Any modern computer with a microphone Software: Python 3.10+, Ollama installed Time: ~30 minutes setup Installation 1. Install Ollama and Pull a Model ollama pull qwen3:14b 2. Install Whisper.cpp git clone https://github.com/ggerganov/whisper.cpp.git cd whisper.cpp cmake -B build && cmake --build build --config Release bash models/download-ggml-model.sh medium 3. Install Kokoro TTS pip install kokoro pyaudio requests Wiring It All Together Save this as voice_assistant.py : import subprocess import tempfile import wave import pyaudio import requests from kokoro import KPipeline OLLAMA_URL = " http://localhost:11434/api/generate " MODEL = " qwen3:14b " WHISPER_BIN = " ./whisper.cpp/build/bin/whisper-cli " WHISPER_MODEL = " ./whisper.cpp/models/ggml-medium.bin " tts_pipeline = KPipeline ( lang_code = ' a ' ) def record_audio ( duration = 5 , sample_rate = 16000 ): p = pyaudio . PyAudio () stream = p . open ( format = pyaudio . paInt16 , channels = 1 , rate = sample_rate , input = True , frames_per_buffer = 1024 ) frames = [ stream . read ( 1024 ) for _ in range ( int ( sample_rate / 1024 * duration ))] stream . close (); p . terminate () with tempfile . NamedTemporaryFile ( suffix = ' .wav ' , delete = False ) as f : wf = wave . open ( f , ' wb ' ) wf . setnchannels ( 1 ); wf . setsampwidth ( 2 ) wf . setframerate ( sample_rate ) wf . writeframes ( b '' . join ( frames )) return f . name def transcribe ( audio_file ): result = subprocess . run ([ WHISPER_BIN , ' -m ' , WHISPER_MODEL , ' -f ' , audio_file ], capture_output = True , text = True ) return result . stdout . strip () def ask_llm ( prompt ): r = requests . post ( OLLAMA_URL , json = { " model " : MODEL , " prompt " : prompt , " stream " : False }) return r . json ()[ " response " ] def speak ( text ): for result in tts_pipeline ( text ): with tempfile . NamedTemporaryFile ( suffix = ' .wav ' , delete = False ) as f : f . write ( result . audio ) subprocess . run ([ ' ffplay ' , ' -nodisp ' , ' -autoexit ' , f . name ]) # Run it print ( " Listening... " ) audio_file = record_audio ( 5 ) text = transcribe ( audio_file ) print ( f " You: { text } " ) response = ask_llm ( text ) print ( f " AI: { response } " ) speak ( response ) Run it: python voice_assistant.py Speak into your mic. Wait 5 seconds. Hear the AI respond. Performance Whisper medium on CPU: transcribes in 2-4 seconds Qwen3 14B on RTX 3060: responds in 3-5 seconds Kokoro TTS on CPU: speaks in real-time (< 1 second latency) Total round-trip: ~10 seconds on modest hardware For faster responses, use Whisper tiny or a smaller LLM like Llama 3.1 8B. Originally published on everylocalai.com

CZYTAJ ŹRÓDŁOWY ARTYKUŁ → WIĘCEJ Z TECH & DEV