Nine Bugs in One Day: Building a TTS System for Claude Code
Almost every assumption in the original plan was wrong.
The plan was simple: give Claude Code a voice. Four bash hooks, a TTS engine, a markdown stripper. Twelve tasks. Should take an afternoon.
By the end of the day, I'd redesigned the architecture three times, deleted more code than I wrote, and learned more about Unix process management, macOS audio internals, and Python's file descriptor model than I'd learned in the previous month combined.
This is the post-mortem. Nine bugs, each one revealing something about how systems actually behave versus how documentation says they should.
The Idea
Claude Code is text-only. Responses appear in the terminal, you read them, you move on. I wanted something different — a system that narrates Claude's responses aloud, announces tool calls as they happen, and goes quiet the moment you start typing.
The engine is Kokoro-82M — a lightweight open-source TTS model that runs locally on Apple Silicon. No API calls, no cloud services, no latency from network round-trips. The model lives on disk and generates speech in real time.
The spec called for four hooks tied to Claude Code events:
- Stop — speak Claude's final response
- PreToolUse — narrate text before each tool call
- UserPromptSubmit — interrupt playback when the user types
- SessionEnd — shut everything down cleanly
Twelve tasks. Straightforward on paper.
Here's the spec-versus-reality scorecard from the end of the day:
| Assumption | Reality |
|---|---|
| CLI spawning is fast enough | 3-5 second cold start per utterance (model load) |
os.setsid() is standard daemonization | It severs macOS CoreAudio — total silence |
| Hooks can run background processes | Claude Code tears down the process group |
sys.stdout = open(...) redirects output | Python keeps original FD in sys.__stdout__ |
pkill kokoro-tts only kills the CLI | Pattern matches the daemon's Python path too |
Four spec features got cut. Four unplanned features got added. The final system solves the same problem with a fundamentally different architecture.
Bug 1: Silent Audio — os.setsid() Kills CoreAudio
Severity: Total — no audio output at all.
The daemon started, loaded the model, received commands, generated audio samples — and produced complete silence. Not quiet audio. Not distorted audio. Nothing.
The daemonize() function used the classic Unix double-fork pattern:
# THE BUG
def daemonize():
if os.fork() > 0:
sys.exit(0)
os.setsid() # <-- THIS KILLS AUDIO
if os.fork() > 0:
sys.exit(0)
This is textbook. Every Unix daemonization guide, every Stack Overflow answer, every tutorial says: fork, setsid(), fork. Create a new session, detach from the terminal, orphan the process.
On macOS, os.setsid() creates a new session leader — which severs the process from the user's audio session. macOS CoreAudio requires the process to be in the same login session as the user. The daemon could generate audio data all day, but sounddevice had no audio device to play it through.
The fix: double-fork without setsid(). Handle terminal detachment via signal.signal(signal.SIGHUP, signal.SIG_IGN) instead.
# THE FIX
def daemonize():
if os.fork() > 0:
sys.exit(0)
# Do NOT call os.setsid() — severs audio session
if os.fork() > 0:
os._exit(0)
signal.signal(signal.SIGHUP, signal.SIG_IGN)
The lesson: When a standard pattern doesn't work, question the pattern, not just your code. The code was correct — the technique was wrong for the platform.
Bug 2: Hooks Blocking Claude Code
Severity: High — session froze for 60+ seconds.
After the Stop hook fired, Claude Code hung for over a minute. The speech played, but the entire session was unresponsive until the utterance finished.
The hook sent a speak command to the daemon and waited for the response:
# THE BUG
header = sock.recv(4) # Blocks until daemon responds
resp_len = struct.unpack('>I', header)[0]
data = sock.recv(resp_len) # Blocks until speech finishes
Claude Code hooks are synchronous by design — they block the session until they return. This is intentional (hooks can reject tool calls, modify inputs). But it means any hook that triggers a long-running operation must fire and forget.
The fix: send the command, close the socket immediately, don't wait.
# THE FIX
def send_command(cmd, fire_and_forget=False):
sock.connect(SOCKET_PATH)
sock.sendall(payload)
if fire_and_forget:
sock.close() # Return immediately
return {"status": "ok"}
The lesson: Hooks should return in milliseconds. If your hook regularly approaches its timeout, the architecture is wrong.
Bug 3: Chunky Playback
Severity: Medium — audio worked but sounded terrible.
Speech played with noticeable pauses between sentences. Generate sentence one, play it. Generate sentence two, play it. Gaps between each.
The fix was switching from synchronous generation (model.create()) to streaming (model.create_stream()). The streaming API yields audio chunks as they're generated — each chunk plays immediately while the next one generates. Seamless playback, lower perceived latency, and the ability to interrupt mid-stream.
The lesson: When an API offers both synchronous and streaming modes, evaluate streaming first for anything user-facing.
The Subtle Bugs (Rapid-Fire)
Three more bugs surfaced in quick succession. Each was smaller than the first three but revealed a different system behavior.
Bug 4: Transcript timing. The PreToolUse hook reads Claude's text from the transcript JSONL file. But the transcript isn't always flushed when the hook fires — Claude Code buffers writes. The solution: sleep 0.5 before reading. Ugly, pragmatic, necessary. In systems programming, sometimes you need "settling time" — a brief pause to let the world catch up when you can't control the writer's flush behavior.
Bug 5: Audio overlap. The interrupt hook (UserPromptSubmit) sent a stop command to the daemon in the background with &. This meant the new response could start speaking before the old playback actually stopped — both playing simultaneously for a brief, jarring moment. The original fix was making the interrupt synchronous — the whole point of an interrupt is guaranteeing that the previous operation stops before the next one starts. But honestly, this one isn't fully solved yet. The overlap still occurs in certain timing windows, and a proper fix is on the backlog. Sometimes you ship with known issues and come back to them.
Bug 6: pkill friendly fire. The interrupt hook used pkill -9 kokoro-tts to kill any running CLI processes. But pkill matches against the entire command line, not just the process name. The daemon was launched with a Python interpreter whose path contained kokoro-tts:
/Users/alex/.local/share/uv/tools/kokoro-tts/bin/python3
So pkill -9 kokoro-tts matched the daemon too and killed it. The fix: stop using pkill entirely. Send a stop command to the daemon via the Unix socket instead. Target a specific process, not a pattern.
Bug 7: The Process Group Problem
Severity: Critical — forced a complete architecture redesign.
This was the bug that broke the original architecture. Every attempt to start the daemon from a hook script failed:
python3 kokoro-daemon.py & # Dies
nohup python3 kokoro-daemon.py & # Dies
python3 kokoro-daemon.py </dev/null >/dev/null 2>&1 & disown # Dies
Claude Code runs hooks in a controlled process group. When the hook finishes, the entire process group is torn down. nohup only ignores SIGHUP — the hook runner uses SIGTERM. disown removes the process from the shell's job table but not from the process group. Even full file descriptor detachment doesn't help.
The only solution: double-fork inside the Python process so the grandchild escapes the process group before the hook runner tears it down.
def daemonize():
if os.fork() > 0:
sys.exit(0) # Parent exits — hook returns
if os.fork() > 0:
os._exit(0) # Child exits
# Grandchild is orphaned — reparented to PID 1 (launchd)
# NOT in the hook runner's process group anymore
signal.signal(signal.SIGHUP, signal.SIG_IGN)
The key: os.fork() creates a child in the same process group. The second fork creates a grandchild. When the child exits, the grandchild is orphaned — the OS reparents it to PID 1. Since it's now outside the hook runner's process group, the cleanup doesn't touch it.
This forced the entire architecture from "CLI-based with daemon fallback" to "daemon-only." The CLI approach couldn't survive hook teardown. The daemon, properly double-forked, could.
The lesson: You cannot start persistent background processes from Claude Code hooks using shell backgrounding. The daemon must handle its own detachment.
Bug 9: The Session Freeze — Python FD Inheritance
Severity: Critical — Claude Code hung indefinitely on startup. Took 2 hours to diagnose.
This was the most subtle bug. When the daemon manager hook ran on SessionStart, the session would freeze. Claude Code started, showed "Starting TTS daemon..." and never recovered.
The daemonize() function redirected Python's stdio:
# THE BUG
sys.stdin = open(os.devnull, 'r')
sys.stdout = open(os.devnull, 'w')
sys.stderr = open('/tmp/kokoro-daemon.log', 'a')
This looks correct. sys.stdout now points to /dev/null. The daemon won't write to the terminal. Right?
Wrong. Python keeps a reference to the original file descriptors in sys.__stdout__. Reassigning sys.stdout creates a new Python object backed by a new file descriptor (say, FD 5). But the original OS-level file descriptor 1 — the one connected to the hook runner's pipe — is still open.
Here's what happens:
OS Layer: FD 1 → hook runner's pipe (STILL OPEN)
Python Layer: sys.stdout → FD 5 (/dev/null)
sys.__stdout__ → FD 1 (STILL THE PIPE)
The hook runner uses pipes to communicate with hooks. It waits for the pipe to close (EOF) before considering the hook "done." But the daemon inherited FD 1 and never closed it. The pipe stays open forever. The session hangs forever.
The fix: os.dup2() replaces file descriptors at the OS level.
# THE FIX
devnull_fd = os.open(os.devnull, os.O_RDWR)
os.dup2(devnull_fd, 0) # Close FD 0, replace with /dev/null
os.dup2(devnull_fd, 1) # Close FD 1, replace with /dev/null
os.close(devnull_fd)
log_fd = os.open('/tmp/kokoro-daemon.log',
os.O_WRONLY | os.O_CREAT | os.O_APPEND, 0o644)
os.dup2(log_fd, 2) # Close FD 2, replace with log file
os.close(log_fd)
os.dup2(new_fd, old_fd) atomically closes the old FD and makes it point to the same file as the new one. After this, FD 1 points to /dev/null, the pipe is closed, EOF is delivered, and the hook runner proceeds.
The lesson: Python's sys.stdout is NOT file descriptor 1. This is a critical distinction. In any Python daemon launched from a pipe-based parent (hooks, subprocess, CI runners), you must use os.dup2() to close inherited descriptors — not sys.stdout = open(...).
What Got Cut
The v0.3.0 commit tells the story: +174 lines, -471 lines. The system got simpler as it got more reliable.
Audio ducking — 102 lines of AppleScript that lowered Spotify and Apple Music volume during speech. It worked, but created race conditions when playback was interrupted (the "restore volume" and "duck volume" calls collided). Marginal benefit, real complexity. Cut.
pretool-extract.py — 164 lines for parsing transcript JSONL. Replaced by inline bash in the hooks and moving markdown stripping into the daemon itself. Every external script means another Python interpreter startup (~200ms).
CLI fallback — the entire "if daemon is dead, fall back to CLI" path. The CLI had a 3-5 second cold start per utterance (model load). The daemon keeps the model hot. Once the daemon worked reliably, the fallback was dead code.
The spec called for 12 features. Four got cut, four unplanned features got added (daemon manager, socket client, Unix socket protocol, double-fork daemonization). The final system solves the same problem with a fundamentally different architecture.
The Final Architecture
┌─────────────────────────────────────────────────┐
│ Claude Code Session │
│ │
│ SessionStart → daemon manager (double-fork) │
│ PreToolUse → read transcript → send speak │
│ Stop → read transcript → send speak │
│ UserPrompt → send stop (synchronous) │
│ SessionEnd → send shutdown │
└──────────────────────┬──────────────────────────┘
│ Unix socket (fire & forget)
┌──────────────────────┴──────────────────────────┐
│ kokoro-daemon.py (orphan process) │
│ │
│ Kokoro-82M model (loaded once, ~335MB) │
│ Streaming audio playback │
│ Inline markdown stripping │
│ 30-minute idle timeout → auto-shutdown │
│ Protocol: length-prefixed JSON │
└─────────────────────────────────────────────────┘
Six files. ~1,100 lines. Down from 10+ files and ~1,600 lines. Every hook is a thin client that sends a command to the daemon and returns immediately. The daemon handles all the complexity: model management, audio streaming, markdown processing, playback interruption.
The lesson isn't "plan better." The plan was reasonable. Every assumption was defensible on paper. The lesson is that systems programming on macOS has platform-specific behaviors that only surface at runtime — audio session boundaries, process group lifecycles, file descriptor inheritance semantics. You discover them by building, breaking, and debugging with discipline.
The system went through three architectures in a single day. The final one works because every bug taught us something the documentation didn't say.
The plan had twelve tasks. Reality had nine bugs. The bugs were more educational.
View Claude Speaks on GitHub