This step-by-step guide helps you set up and run a Hindi female Text-to-Speech system based on Fastspeech2_HS with ESPnet2 on a GPU-enabled server such as RunPod.
✨ Features
- Female Hindi voice
- Fast inference using GPU (CUDA)
- Based on Fastspeech2_HS and ESPnet2
- Works on Ubuntu, tested with Python 3.10
📁 Step-by-Step Installation
🚀 System Preparation
apt update
apt install -y git python3-pip python3-venv ffmpeg vim less
🔧 Clone & Setup
git clone https://github.com/smtiitm/Fastspeech2_HS.git
cd Fastspeech2_HS
python3 -m venv venv
source venv/bin/activate
🥛 Python Package Installation
pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
python -c "import torch; print(torch.cuda.get_device_name(0))" # ✅ Verifies that your GPU is available and properly configured with PyTorch
📊 Required Libraries
pip install phonemizer g2p_en unidecode soundfile flask nltk jamo sentencepiece inflect numba h5py pydub resampy pyworld
pip install typeguard==2.13.3
pip install --upgrade scipy
pip install indic-num2words
pip install indic-unified-parser
pip install git+https://github.com/espnet/espnet.git
📆 Handle Metadata Issue
pip install pip==23.3.1
📊 Additional Dependencies
pip install fairseq==0.12.2
pip install kaldiio soundfile
pip install pandas
⚖️ Git LFS for Large Models
apt install git-lfs
git lfs install
git lfs pull
👀 Verify Model Integrity
file hindi/female/model/model.pth
head hindi/female/model/model.pth
These commands help verify that the model file is a valid archive and contains expected serialized PyTorch data.
If the file
command shows “Zip archive” and head
shows readable tensor info or metadata keys, the model is valid.
For deeper verification, try running inference and ensure audio is generated without error.
📚 Final Touches
pip install --upgrade setuptools wheel
cd ..
git clone https://github.com/espnet/espnet.git
cd espnet
pip install -e .
🎉 Check ESPnet2
cd ../Fastspeech2_HS
python -c "from espnet2.bin.tts_inference import Text2Speech; print('ESPnet2 is working ✅')"
🎧 Run Hindi TTS Inference
python inference.py \
--text "नमस्ते, यह एक महिला आवाज़ में हिंदी टेक्स्ट टू स्पीच का डेमो है" \
--language hindi \
--gender female \
--alpha 1 \
--output_file hindi_female_output.wav
❓ FAQ
Q. Can I run this on CPU?
A. Technically yes, but it will be extremely slow. GPU is highly recommended.
Q. Where are the models stored?
A. In the hindi/female/model/
directory. Use git lfs pull
to download them.
Q. Can I use it for other languages?
A. Yes, it supports many Indian languages like Bengali, Marathi, Tamil, etc.
🌎 Explore More
Visit the GitHub repo: https://github.com/smtiitm/Fastspeech2_HS
Built and tested on RunPod. Works smoothly with CUDA 12.1 environment.
If you found this helpful, consider sharing it with other developers or language enthusiasts! 🚀