You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Prohibited use: Use of our model must comply with all applicable laws and regulations and must not result in, involve, or facilitate any illegal, harmful, deceptive, fraudulent, or unauthorized activity. Prohibited uses include, without limitation, voice impersonation or cloning without explicit and lawful consent; misinformation, disinformation, or deception (including fake news, fraudulent calls, or presenting generated content as genuine recordings of real people or events); and the generation of unlawful, harmful, libelous, abusive, harassing, discriminatory, hateful, or privacy-invasive content. We disclaim all liability for any non-compliant use.

Pocket TTS

A lightweight text-to-speech (TTS) application designed to run efficiently on CPUs. Forget about the hassle of using GPUs and web APIs serving TTS models. With Kyutai's Pocket TTS, generating audio is just a pip install and a function call away.

Supports Python 3.10, 3.11, 3.12, 3.13 and 3.14. Requires PyTorch 2.5+. Does not require the gpu version of PyTorch.

Main takeaways

Runs on CPU
Small model size, 100M parameters
Audio streaming
Low latency, ~200ms to get the first audio chunk
Faster than real-time, ~6x real-time on a CPU of MacBook Air M4
Uses only 2 CPU cores
Python API and CLI
Voice cloning
English only at the moment
Can handle infinitely long text inputs
Can run on client-side in the browser

More languages are planned: See our official announcement

Trying it from the website, without installing anything

Navigate to the Kyutai website to try it out directly in your browser. You can input text, select different voices, and generate speech without any installation.

Trying it with the CLI

The `generate` command

You can use pocket-tts directly from the command line. We recommend using uv as it installs any dependencies on the fly in an isolated environment (uv installation instructions here). You can also use pip install pocket-tts to install it manually.

This will generate a wav file ./tts_output.wav saying the default text with the default voice, and display some speed statistics.

uvx pocket-tts generate
# or if you installed it manually with pip:
pocket-tts generate

Modify the voice with --voice and the text with --text. We provide a small catalog of voices.

You can take a look at this page which details the licenses for each voice.

The --voice argument can also take a plain wav file as input for voice cloning. You can use your own or check out our voice repository. We recommend cleaning the sample before using it with Pocket TTS, because the audio quality of the sample is also reproduced.

Feel free to check out the generate documentation for more details and examples. For trying multiple voices and prompts quickly, prefer using the serve command.

The `serve` command

You can also run a local server to generate audio via HTTP requests.

uvx pocket-tts serve
# or if you installed it manually with pip:
pocket-tts serve

Navigate to http://localhost:8000 to try the web interface, it's faster than the command line as the model is kept in memory between requests.

You can check out the serve documentation for more details and examples.

The `export-voice` command

Processing an audio file (e.g., a .wav or .mp3) for voice cloning is relatively slow, but loading a safetensors file -- a voice embedding converted from an audio file -- is very fast. You can use the export-voice command to do this conversion. See the export-voice documentation for more details and examples.

Using it as a Python library

You can try out the Python library on Colab here.

Install the package with

pip install pocket-tts
# or
uv add pocket-tts

You can use this package as a simple Python library to generate audio from text.

from pocket_tts import TTSModel
import scipy.io.wavfile

tts_model = TTSModel.load_model()
voice_state = tts_model.get_state_for_audio_prompt(
    "alba"  # One of the pre-made voices, see above
    # You can also use any voice file you have locally or from Hugging Face:
    # "./some_audio.wav"
    # or "hf://kyutai/tts-voices/expresso/ex01-ex02_default_001_channel2_198s.wav"
)
audio = tts_model.generate_audio(voice_state, "Hello world, this is a test.")
# Audio is a 1D torch tensor containing PCM data.
scipy.io.wavfile.write("output.wav", tts_model.sample_rate, audio.numpy())

You can have multiple voice states around if you have multiple voices you want to use. load_model() and get_state_for_audio_prompt() are relatively slow operations, so we recommend to keep the model and voice states in memory if you can.

For faster voice loading, you can export voice states to safetensors files:

from pocket_tts import TTSModel, export_model_state, import_model_state

model = TTSModel.load_model()

# Export a voice state for fast loading later
voice_state = model.get_state_for_audio_prompt("alba")
export_model_state(voice_state, "alba.safetensors")

# Later, load it quickly (much faster than processing audio)
voice_state = import_model_state("alba.safetensors")
audio = model.generate_audio(voice_state, "Hello world!")

You can check out the Python API documentation for more details and examples.

Unsupported features

At the moment, we do not support (but would love pull requests adding):

We tried running this TTS model on the GPU but did not observe a speedup compared to CPU execution, notably because we use a batch size of 1 and a very small model.

Development and local setup

We accept contributions! Feel free to open issues or pull requests on GitHub.

You can find development instructions in the CONTRIBUTING.md file. You'll also find there how to have an editable install of the package for local development.

In-browser implementations

Pocket TTS is small enough to run directly in your browser in WebAssembly/JavaScript. We don't have official support for this yet, but you can try out one of these community implementations:

pocket-tts-onnx-export by @KevinAHM: Model exported to .onnx and run using ONNX Runtime Web. Demo here
pocket-tts by @babybirdprd: Candle version (Rust) with WebAssembly and PyO3 bindings, meaning it can run on the web too.
jax-js by @ekzhang: Using jax-js, a ML library for the web. Demo here

Alterative implementations

pocket-tts-mlx by @jishnuvenugopal - MLX backend optimized for Apple Silicon
pocket-tts by @babybirdprd - Candle version (Rust) with WebAssembly and PyO3 bindings.

Projects using Pocket TTS

pocket-reader by @lukasmwerner- Browser screen reader
pocket-tts-wyoming by @ikidd - Docker container for pocket-tts using Wyoming protocol, ready for Home Assistant Voice use.
Sonorus by @KevinAHM - Talk to any named character in Hogwarts Legacy with their original voice.
Mac pocket-tts by @slaughters85j - Mac Desktop App + macOS Quick Action
pocket-tts-openai_streaming_server by @teddybear082 - OpenAI-compatible streaming server, dockerized and with an .exe release
pocket-tts-unity by @lookbe - A Unity 6 integration for Pocket-TTS.

Prohibited use

Use of our model must comply with all applicable laws and regulations and must not result in, involve, or facilitate any illegal, harmful, deceptive, fraudulent, or unauthorized activity. Prohibited uses include, without limitation, voice impersonation or cloning without explicit and lawful consent; misinformation, disinformation, or deception (including fake news, fraudulent calls, or presenting generated content as genuine recordings of real people or events); and the generation of unlawful, harmful, libelous, abusive, harassing, discriminatory, hateful, or privacy-invasive content. We disclaim all liability for any non-compliant use.

Authors

Manu Orsini*, Simon Rouard*, Gabriel De Marmiesse*, Václav Volhejn, Neil Zeghidour, Alexandre Défossez

*equal contribution

Downloads last month: 58,686

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kyutai/pocket-tts

Finetunes

2 models

Quantizations

4 models

Spaces using kyutai/pocket-tts 67

Collection including kyutai/pocket-tts

Text-To-Speech

Collection

https://kyutai.org/next/tts • 7 items • Updated 2 days ago • 25

Paper for kyutai/pocket-tts

Continuous Audio Language Models

Paper • 2509.06926 • Published Sep 8, 2025 • 3