Voice-Driven Development with Handy and Pochi

This tutorial guides you through setting up an efficient, voice-driven development workflow by combining two powerful tools: Handy for local voice transcription and Pochi as your AI coding assistant.

This setup allows you to speak commands, have them transcribed into text by Handy, and then have Pochi execute those commands, creating a seamless, hands-free coding experience that keeps your audio on-device.

What You Will Learn

How to install and configure Handy for offline voice-to-text transcription.
How to integrate Handy with Pochi to create a voice-driven development workflow.
Tips for optimizing your setup for accuracy and efficiency.

Prerequisites

Pochi: You should have Pochi installed and running in your editor.
Handy: Downloadable for macOS, Windows, and Linux from the official releases page.
Microphone and accessibility permissions: Handy needs access to your microphone and the ability to paste text into other applications.

How It Works

Handy listens to your voice and converts it into text using on-device transcription models (Whisper or Parakeet).
Pochi takes the text generated by Handy as a command and processes it using its configured AI model.

For a Fully Private Workflow This tutorial focuses on the voice input aspect. For complete privacy and offline use, you can configure Pochi to use a local Large Language Model (LLM). See Pochi's documentation on model configuration for more details.

Step 1: Install and Configure Handy

Handy is a cross-platform desktop application built with Tauri that records your voice via a keyboard shortcut and pastes the transcribed text into any application.

Download Handy: Get the latest version from the official releases page or the project website handy.computer.
Install and Grant Permissions:
- macOS: Open the downloaded .dmg, drag Handy into your Applications folder, and launch it once. Grant Microphone and Accessibility permissions when prompted so Handy can record audio and paste text.
- Windows: Run the installer, then allow Handy to access the microphone. You may need to approve the accessibility overlay so Handy can send keystrokes.
- Linux: Make the AppImage executable (e.g., chmod +x Handy-x86_64.AppImage), run it, and approve microphone/input permissions as required by your desktop environment.
Configure Handy Settings:

A. Set Your Hotkey Choose a shortcut that does not conflict with other apps. Handy supports both push-to-talk and toggle behaviors. Enable Push to Talk if you want to hold the shortcut while speaking; otherwise, use the default press-to-start, press-again-to-stop mode.

B. Select Your Transcription Engine Handy can run multiple local transcription models:
- Whisper models (Small/Medium/Turbo/Large) for GPU-accelerated accuracy on macOS, Windows, and Linux.
- Parakeet V3 for fast CPU-only transcription with automatic language detection.
C. Download Your Model

The first time you pick a model, Handy downloads and optimizes it. This may take several minutes and will temporarily increase CPU usage.

D. Customize Output Behavior Tune options such as automatic text pasting, capitalization, and whether Handy should clear or append text after each recording.

Step 2: The Voice Coding Workflow in Action

With Handy installed, let's try the workflow.

Open Pochi: Launch Pochi's chat interface in your code editor.
Activate Handy: Click inside Pochi's input box, then use the shortcut you configured for Handy. Handy shows an on-screen indicator when it is listening.
Speak Your Command: Clearly state the task you want Pochi to perform. For example:

Write a TypeScript function that takes an array of strings and returns the longest one.
Transcription: When you stop recording, Handy transcribes your speech and pastes the text into Pochi's input box (or copies it to the clipboard, depending on your settings).
Execution: Pochi sends the text to its configured AI model for processing. The generated code or response appears in the chat.

Tips and Troubleshooting

Inaccurate Transcription? Try switching to a larger Whisper model or to Parakeet V3 for improved accuracy on CPU-only systems.
Performance Concerns? Smaller Whisper models consume fewer resources. If latency is high, choose a smaller model or use Parakeet V3.
Need Help with Handy? Open an issue on the GitHub repository.

Conclusion

By combining Handy and Pochi, you've created a fast and efficient voice-powered programming environment. This setup gives you a modern, hands-free way to translate your ideas into code while keeping the entire transcription pipeline on your device.

On this page