Coqui TTS is a powerful open-source text-to-speech system that supports multiple languages, including Spanish, German, Chinese, Japanese etc . In this guide, we’ll walk you through how to install it on your Mac , resolve compatibility issues with PyTorch, and generate natural-sounding Spanish speech step by step.
🗣️ Bonus Tip: Preview Built-in macOS Spanish Voices
Before diving into Coqui TTS, you might want to explore what’s already built into macOS.
👉 To list all available voices on macOS:
say -v "?"
You’ll see a list like:
...
Jorge es_ES # Spanish (Spain)
Juan es_MX # Spanish (Mexico)
Paulina es_MX # Spanish (Mexico)
Monica es_ES # Spanish (Spain)
...
To test one of these voices:
say -v "Jorge" "Hola, ¿cómo estás?"
While macOS voices are good, they are not open-source and can’t be customized. That’s where Coqui TTS comes in.
✅ Prerequisites
🔧 Step 1: Create a Conda Environment
We’ll use Python 3.10 for compatibility.
conda create -n coqui-tts python=3.10 -y
conda activate coqui-tts
📦 Step 2: Install Coqui TTS with Full Features
Install the TTS library along with development and notebook extras for full support:
pip install "TTS[all,dev,notebooks]"
⚠️ Step 3: Fix the PyTorch Compatibility Issue
Coqui TTS models like Tacotron2 use custom layers that are blocked by newer PyTorch versions (>=2.6) due to a pickle.UnpicklingError
. To fix this, downgrade PyTorch to version 2.5.1.
pip uninstall torch torchaudio -y
pip install torch==2.5.1 torchaudio==2.5.1
📥 Step 4: Download and Use a Spanish TTS Model
We’ll use the pre-trained Spanish model tts_models/es/mai/tacotron2-DDC
.
Run this command to synthesize Spanish audio:
tts --text "Hola, ¿cómo estás?" \
--model_name tts_models/es/mai/tacotron2-DDC \
--out_path hola.wav
✅ This will:
- Download the Spanish TTS model
- Download a compatible vocoder (MelGAN)
- Generate
hola.wav
with natural-sounding Spanish
To play the audio:
afplay hola.wav
🛠 Optional: Use Python Script Instead of CLI
Here’s a Python version if you prefer using code:
from TTS.api import TTS
tts = TTS(model_name="tts_models/es/mai/tacotron2-DDC")
tts.tts_to_file(text="Hola, ¿cómo estás?", file_path="output.wav")
🚀 Bonus: Try Higher Quality with Voice Cloning (xtts_v2)
For even better quality and multilingual voice cloning (optional):
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2")
tts.tts_to_file(
text="Hola, ¿cómo estás?",
speaker_wav="your-voice-sample.wav", # Optional for cloning
language="es",
file_path="xtts_output.wav"
)
✅ Final Thoughts
With this setup:
- You’re running Coqui TTS locally on Mac M1 Pro
- Using high-quality Spanish voices
- Solved PyTorch compatibility using the correct version
- And optionally prepared for voice cloning or multilingual support