Install and Run Hindi Female Text-to-Speech (TTS) on RunPod or Any Linux Server

This step-by-step guide helps you set up and run a Hindi female Text-to-Speech system based on Fastspeech2_HS with ESPnet2 on a GPU-enabled server such as RunPod.

✨ Features

Female Hindi voice
Fast inference using GPU (CUDA)
Based on Fastspeech2_HS and ESPnet2
Works on Ubuntu, tested with Python 3.10

📁 Step-by-Step Installation

🚀 System Preparation

apt update
apt install -y git python3-pip python3-venv ffmpeg vim less

🔧 Clone & Setup

git clone https://github.com/smtiitm/Fastspeech2_HS.git
cd Fastspeech2_HS
python3 -m venv venv
source venv/bin/activate

🥛 Python Package Installation

pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

python -c "import torch; print(torch.cuda.get_device_name(0))"  # ✅ Verifies that your GPU is available and properly configured with PyTorch

📊 Required Libraries

pip install phonemizer g2p_en unidecode soundfile flask nltk jamo sentencepiece inflect numba h5py pydub resampy pyworld
pip install typeguard==2.13.3
pip install --upgrade scipy
pip install indic-num2words
pip install indic-unified-parser
pip install git+https://github.com/espnet/espnet.git

📆 Handle Metadata Issue

pip install pip==23.3.1

📊 Additional Dependencies

pip install fairseq==0.12.2
pip install kaldiio soundfile
pip install pandas

⚖️ Git LFS for Large Models

apt install git-lfs
git lfs install
git lfs pull

👀 Verify Model Integrity

file hindi/female/model/model.pth
head hindi/female/model/model.pth

These commands help verify that the model file is a valid archive and contains expected serialized PyTorch data.

If the file command shows “Zip archive” and head shows readable tensor info or metadata keys, the model is valid.

For deeper verification, try running inference and ensure audio is generated without error.

📚 Final Touches

pip install --upgrade setuptools wheel
cd ..
git clone https://github.com/espnet/espnet.git
cd espnet
pip install -e .

🎉 Check ESPnet2

cd ../Fastspeech2_HS
python -c "from espnet2.bin.tts_inference import Text2Speech; print('ESPnet2 is working ✅')"

🎧 Run Hindi TTS Inference

python inference.py \
  --text "नमस्ते, यह एक महिला आवाज़ में हिंदी टेक्स्ट टू स्पीच का डेमो है" \
  --language hindi \
  --gender female \
  --alpha 1 \
  --output_file hindi_female_output.wav

❓ FAQ

Q. Can I run this on CPU?

A. Technically yes, but it will be extremely slow. GPU is highly recommended.

Q. Where are the models stored?

A. In the hindi/female/model/ directory. Use git lfs pull to download them.

Q. Can I use it for other languages?

A. Yes, it supports many Indian languages like Bengali, Marathi, Tamil, etc.

🌎 Explore More

Visit the GitHub repo: https://github.com/smtiitm/Fastspeech2_HS

Built and tested on RunPod. Works smoothly with CUDA 12.1 environment.

If you found this helpful, consider sharing it with other developers or language enthusiasts! 🚀