{"id":1576,"date":"2025-07-11T15:31:30","date_gmt":"2025-07-11T15:31:30","guid":{"rendered":"https:\/\/zahiralam.com\/blog\/?p=1576"},"modified":"2025-07-28T06:51:51","modified_gmt":"2025-07-28T06:51:51","slug":"install-and-run-hindi-female-text-to-speech-tts-on-runpod-or-any-linux-server","status":"publish","type":"post","link":"https:\/\/zahiralam.com\/blog\/install-and-run-hindi-female-text-to-speech-tts-on-runpod-or-any-linux-server\/","title":{"rendered":"Install and Run Hindi Female Text-to-Speech (TTS) on RunPod or Any Linux Server"},"content":{"rendered":"\n<p>This step-by-step guide helps you set up and run a Hindi female Text-to-Speech system based on Fastspeech2_HS with ESPnet2 on a GPU-enabled server such as RunPod.\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\u2728 Features<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Female Hindi voice<\/li>\n\n\n\n<li>Fast inference using GPU (CUDA)<\/li>\n\n\n\n<li>Based on Fastspeech2_HS and ESPnet2<\/li>\n\n\n\n<li>Works on Ubuntu, tested with Python 3.10<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udcc1 Step-by-Step Installation<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\ude80 System Preparation<\/h3>\n\n\n\n<div class=\"code-block-container\">\n                        <pre class=\"wp-block-code\"><code id=\"code-1\">apt update\napt install -y git python3-pip python3-venv ffmpeg vim less<\/code><\/pre>\n                        <amp-iframe sandbox=\"allow-scripts\" width=\"94\" height=\"72\" frameborder=\"0\" \n                                    src=\"https:\/\/zahiralam.com\/blog\/wp-content\/plugins\/amp-copy-code-button\/copier.html#apt%20update%0Aapt%20install%20-y%20git%20python3-pip%20python3-venv%20ffmpeg%20vim%20less\">\n                            <button class=\"copy-button\" data-label=\"apt update\napt install -y git python3-pip python3-venv ffmpeg vim less\"  placeholder disabled>Copy<\/button>\n                        <\/amp-iframe>\n                    <\/div>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udd27 Clone &amp; Setup<\/h3>\n\n\n\n<div class=\"code-block-container\">\n                        <pre class=\"wp-block-code\"><code id=\"code-2\">git clone https:\/\/github.com\/smtiitm\/Fastspeech2_HS.git\ncd Fastspeech2_HS\npython3 -m venv venv\nsource venv\/bin\/activate<\/code><\/pre>\n                        <amp-iframe sandbox=\"allow-scripts\" width=\"94\" height=\"72\" frameborder=\"0\" \n                                    src=\"https:\/\/zahiralam.com\/blog\/wp-content\/plugins\/amp-copy-code-button\/copier.html#git%20clone%20https%3A%2F%2Fgithub.com%2Fsmtiitm%2FFastspeech2_HS.git%0Acd%20Fastspeech2_HS%0Apython3%20-m%20venv%20venv%0Asource%20venv%2Fbin%2Factivate\">\n                            <button class=\"copy-button\" data-label=\"git clone https:\/\/github.com\/smtiitm\/Fastspeech2_HS.git\ncd Fastspeech2_HS\npython3 -m venv venv\nsource venv\/bin\/activate\"  placeholder disabled>Copy<\/button>\n                        <\/amp-iframe>\n                    <\/div>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83e\udd5b Python Package Installation<\/h3>\n\n\n\n<div class=\"code-block-container\">\n                        <pre class=\"wp-block-code\"><code id=\"code-3\">pip install --upgrade pip\npip install torch torchvision torchaudio --index-url https:\/\/download.pytorch.org\/whl\/cu121<\/code><\/pre>\n                        <amp-iframe sandbox=\"allow-scripts\" width=\"94\" height=\"72\" frameborder=\"0\" \n                                    src=\"https:\/\/zahiralam.com\/blog\/wp-content\/plugins\/amp-copy-code-button\/copier.html#pip%20install%20--upgrade%20pip%0Apip%20install%20torch%20torchvision%20torchaudio%20--index-url%20https%3A%2F%2Fdownload.pytorch.org%2Fwhl%2Fcu121\">\n                            <button class=\"copy-button\" data-label=\"pip install --upgrade pip\npip install torch torchvision torchaudio --index-url https:\/\/download.pytorch.org\/whl\/cu121\"  placeholder disabled>Copy<\/button>\n                        <\/amp-iframe>\n                    <\/div>\n\n\n\n<div class=\"code-block-container\">\n                        <pre class=\"wp-block-code\"><code id=\"code-4\">python -c &quot;import torch; print(torch.cuda.get_device_name(0))&quot;  # \u2705 Verifies that your GPU is available and properly configured with PyTorch<\/code><\/pre>\n                        <amp-iframe sandbox=\"allow-scripts\" width=\"94\" height=\"72\" frameborder=\"0\" \n                                    src=\"https:\/\/zahiralam.com\/blog\/wp-content\/plugins\/amp-copy-code-button\/copier.html#python%20-c%20%22import%20torch%3B%20print%28torch.cuda.get_device_name%280%29%29%22%20%20%23%20%E2%9C%85%20Verifies%20that%20your%20GPU%20is%20available%20and%20properly%20configured%20with%20PyTorch\">\n                            <button class=\"copy-button\" data-label=\"python -c &quot;import torch; print(torch.cuda.get_device_name(0))&quot;  # \u2705 Verifies that your GPU is available and properly configured with PyTorch\"  placeholder disabled>Copy<\/button>\n                        <\/amp-iframe>\n                    <\/div>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udcca Required Libraries<\/h3>\n\n\n\n<div class=\"code-block-container\">\n                        <pre class=\"wp-block-code\"><code id=\"code-5\">pip install phonemizer g2p_en unidecode soundfile flask nltk jamo sentencepiece inflect numba h5py pydub resampy pyworld\npip install typeguard==2.13.3\npip install --upgrade scipy\npip install indic-num2words\npip install indic-unified-parser\npip install git+https:\/\/github.com\/espnet\/espnet.git<\/code><\/pre>\n                        <amp-iframe sandbox=\"allow-scripts\" width=\"94\" height=\"72\" frameborder=\"0\" \n                                    src=\"https:\/\/zahiralam.com\/blog\/wp-content\/plugins\/amp-copy-code-button\/copier.html#pip%20install%20phonemizer%20g2p_en%20unidecode%20soundfile%20flask%20nltk%20jamo%20sentencepiece%20inflect%20numba%20h5py%20pydub%20resampy%20pyworld%0Apip%20install%20typeguard%3D%3D2.13.3%0Apip%20install%20--upgrade%20scipy%0Apip%20install%20indic-num2words%0Apip%20install%20indic-unified-parser%0Apip%20install%20git%2Bhttps%3A%2F%2Fgithub.com%2Fespnet%2Fespnet.git\">\n                            <button class=\"copy-button\" data-label=\"pip install phonemizer g2p_en unidecode soundfile flask nltk jamo sentencepiece inflect numba h5py pydub resampy pyworld\npip install typeguard==2.13.3\npip install --upgrade scipy\npip install indic-num2words\npip install indic-unified-parser\npip install git+https:\/\/github.com\/espnet\/espnet.git\"  placeholder disabled>Copy<\/button>\n                        <\/amp-iframe>\n                    <\/div>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udcc6 Handle Metadata Issue<\/h3>\n\n\n\n<div class=\"code-block-container\">\n                        <pre class=\"wp-block-code\"><code id=\"code-6\">pip install pip==23.3.1<\/code><\/pre>\n                        <amp-iframe sandbox=\"allow-scripts\" width=\"94\" height=\"72\" frameborder=\"0\" \n                                    src=\"https:\/\/zahiralam.com\/blog\/wp-content\/plugins\/amp-copy-code-button\/copier.html#pip%20install%20pip%3D%3D23.3.1\">\n                            <button class=\"copy-button\" data-label=\"pip install pip==23.3.1\"  placeholder disabled>Copy<\/button>\n                        <\/amp-iframe>\n                    <\/div>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udcca Additional Dependencies<\/h3>\n\n\n\n<div class=\"code-block-container\">\n                        <pre class=\"wp-block-code\"><code id=\"code-7\">pip install fairseq==0.12.2\npip install kaldiio soundfile\npip install pandas<\/code><\/pre>\n                        <amp-iframe sandbox=\"allow-scripts\" width=\"94\" height=\"72\" frameborder=\"0\" \n                                    src=\"https:\/\/zahiralam.com\/blog\/wp-content\/plugins\/amp-copy-code-button\/copier.html#pip%20install%20fairseq%3D%3D0.12.2%0Apip%20install%20kaldiio%20soundfile%0Apip%20install%20pandas\">\n                            <button class=\"copy-button\" data-label=\"pip install fairseq==0.12.2\npip install kaldiio soundfile\npip install pandas\"  placeholder disabled>Copy<\/button>\n                        <\/amp-iframe>\n                    <\/div>\n\n\n\n<h3 class=\"wp-block-heading\">\u2696\ufe0f Git LFS for Large Models<\/h3>\n\n\n\n<div class=\"code-block-container\">\n                        <pre class=\"wp-block-code\"><code id=\"code-8\">apt install git-lfs\ngit lfs install\ngit lfs pull<\/code><\/pre>\n                        <amp-iframe sandbox=\"allow-scripts\" width=\"94\" height=\"72\" frameborder=\"0\" \n                                    src=\"https:\/\/zahiralam.com\/blog\/wp-content\/plugins\/amp-copy-code-button\/copier.html#apt%20install%20git-lfs%0Agit%20lfs%20install%0Agit%20lfs%20pull\">\n                            <button class=\"copy-button\" data-label=\"apt install git-lfs\ngit lfs install\ngit lfs pull\"  placeholder disabled>Copy<\/button>\n                        <\/amp-iframe>\n                    <\/div>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udc40 Verify Model Integrity<\/h3>\n\n\n\n<div class=\"code-block-container\">\n                        <pre class=\"wp-block-code\"><code id=\"code-9\">file hindi\/female\/model\/model.pth\nhead hindi\/female\/model\/model.pth<\/code><\/pre>\n                        <amp-iframe sandbox=\"allow-scripts\" width=\"94\" height=\"72\" frameborder=\"0\" \n                                    src=\"https:\/\/zahiralam.com\/blog\/wp-content\/plugins\/amp-copy-code-button\/copier.html#file%20hindi%2Ffemale%2Fmodel%2Fmodel.pth%0Ahead%20hindi%2Ffemale%2Fmodel%2Fmodel.pth\">\n                            <button class=\"copy-button\" data-label=\"file hindi\/female\/model\/model.pth\nhead hindi\/female\/model\/model.pth\"  placeholder disabled>Copy<\/button>\n                        <\/amp-iframe>\n                    <\/div>\n\n\n\n<p>These commands help verify that the model file is a valid archive and contains expected serialized PyTorch data.\n\n\n\n<p>If the <code>file<\/code> command shows &#8220;Zip archive&#8221; and <code>head<\/code> shows readable tensor info or metadata keys, the model is valid.\n\n\n\n<p>For deeper verification, try running inference and ensure audio is generated without error.\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udcda Final Touches<\/h3>\n\n\n\n<div class=\"code-block-container\">\n                        <pre class=\"wp-block-code\"><code id=\"code-10\">pip install --upgrade setuptools wheel\ncd ..\ngit clone https:\/\/github.com\/espnet\/espnet.git\ncd espnet\npip install -e .<\/code><\/pre>\n                        <amp-iframe sandbox=\"allow-scripts\" width=\"94\" height=\"72\" frameborder=\"0\" \n                                    src=\"https:\/\/zahiralam.com\/blog\/wp-content\/plugins\/amp-copy-code-button\/copier.html#pip%20install%20--upgrade%20setuptools%20wheel%0Acd%20..%0Agit%20clone%20https%3A%2F%2Fgithub.com%2Fespnet%2Fespnet.git%0Acd%20espnet%0Apip%20install%20-e%20.\">\n                            <button class=\"copy-button\" data-label=\"pip install --upgrade setuptools wheel\ncd ..\ngit clone https:\/\/github.com\/espnet\/espnet.git\ncd espnet\npip install -e .\"  placeholder disabled>Copy<\/button>\n                        <\/amp-iframe>\n                    <\/div>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83c\udf89 Check ESPnet2<\/h3>\n\n\n\n<div class=\"code-block-container\">\n                        <pre class=\"wp-block-code\"><code id=\"code-11\">cd ..\/Fastspeech2_HS\npython -c &quot;from espnet2.bin.tts_inference import Text2Speech; print(&#039;ESPnet2 is working \u2705&#039;)&quot;<\/code><\/pre>\n                        <amp-iframe sandbox=\"allow-scripts\" width=\"94\" height=\"72\" frameborder=\"0\" \n                                    src=\"https:\/\/zahiralam.com\/blog\/wp-content\/plugins\/amp-copy-code-button\/copier.html#cd%20..%2FFastspeech2_HS%0Apython%20-c%20%22from%20espnet2.bin.tts_inference%20import%20Text2Speech%3B%20print%28%27ESPnet2%20is%20working%20%E2%9C%85%27%29%22\">\n                            <button class=\"copy-button\" data-label=\"cd ..\/Fastspeech2_HS\npython -c &quot;from espnet2.bin.tts_inference import Text2Speech; print(&#039;ESPnet2 is working \u2705&#039;)&quot;\"  placeholder disabled>Copy<\/button>\n                        <\/amp-iframe>\n                    <\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83c\udfa7 Run Hindi TTS Inference<\/h2>\n\n\n\n<div class=\"code-block-container\">\n                        <pre class=\"wp-block-code\"><code id=\"code-12\">python inference.py \\\n  --text &quot;\u0928\u092e\u0938\u094d\u0924\u0947, \u092f\u0939 \u090f\u0915 \u092e\u0939\u093f\u0932\u093e \u0906\u0935\u093e\u091c\u093c \u092e\u0947\u0902 \u0939\u093f\u0902\u0926\u0940 \u091f\u0947\u0915\u094d\u0938\u094d\u091f \u091f\u0942 \u0938\u094d\u092a\u0940\u091a \u0915\u093e \u0921\u0947\u092e\u094b \u0939\u0948&quot; \\\n  --language hindi \\\n  --gender female \\\n  --alpha 1 \\\n  --output_file hindi_female_output.wav<\/code><\/pre>\n                        <amp-iframe sandbox=\"allow-scripts\" width=\"94\" height=\"72\" frameborder=\"0\" \n                                    src=\"https:\/\/zahiralam.com\/blog\/wp-content\/plugins\/amp-copy-code-button\/copier.html#python%20inference.py%20%5C%0A%20%20--text%20%22%E0%A4%A8%E0%A4%AE%E0%A4%B8%E0%A5%8D%E0%A4%A4%E0%A5%87%2C%20%E0%A4%AF%E0%A4%B9%20%E0%A4%8F%E0%A4%95%20%E0%A4%AE%E0%A4%B9%E0%A4%BF%E0%A4%B2%E0%A4%BE%20%E0%A4%86%E0%A4%B5%E0%A4%BE%E0%A4%9C%E0%A4%BC%20%E0%A4%AE%E0%A5%87%E0%A4%82%20%E0%A4%B9%E0%A4%BF%E0%A4%82%E0%A4%A6%E0%A5%80%20%E0%A4%9F%E0%A5%87%E0%A4%95%E0%A5%8D%E0%A4%B8%E0%A5%8D%E0%A4%9F%20%E0%A4%9F%E0%A5%82%20%E0%A4%B8%E0%A5%8D%E0%A4%AA%E0%A5%80%E0%A4%9A%20%E0%A4%95%E0%A4%BE%20%E0%A4%A1%E0%A5%87%E0%A4%AE%E0%A5%8B%20%E0%A4%B9%E0%A5%88%22%20%5C%0A%20%20--language%20hindi%20%5C%0A%20%20--gender%20female%20%5C%0A%20%20--alpha%201%20%5C%0A%20%20--output_file%20hindi_female_output.wav\">\n                            <button class=\"copy-button\" data-label=\"python inference.py \\\n  --text &quot;\u0928\u092e\u0938\u094d\u0924\u0947, \u092f\u0939 \u090f\u0915 \u092e\u0939\u093f\u0932\u093e \u0906\u0935\u093e\u091c\u093c \u092e\u0947\u0902 \u0939\u093f\u0902\u0926\u0940 \u091f\u0947\u0915\u094d\u0938\u094d\u091f \u091f\u0942 \u0938\u094d\u092a\u0940\u091a \u0915\u093e \u0921\u0947\u092e\u094b \u0939\u0948&quot; \\\n  --language hindi \\\n  --gender female \\\n  --alpha 1 \\\n  --output_file hindi_female_output.wav\"  placeholder disabled>Copy<\/button>\n                        <\/amp-iframe>\n                    <\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\u2753 FAQ<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Q. Can I run this on CPU?<\/h3>\n\n\n\n<p>A. Technically yes, but it will be extremely slow. GPU is highly recommended.\n\n\n\n<h3 class=\"wp-block-heading\">Q. Where are the models stored?<\/h3>\n\n\n\n<p>A. In the&nbsp;<code>hindi\/female\/model\/<\/code>&nbsp;directory. Use&nbsp;<code>git lfs pull<\/code>&nbsp;to download them.\n\n\n\n<h3 class=\"wp-block-heading\">Q. Can I use it for other languages?<\/h3>\n\n\n\n<p>A. Yes, it supports many Indian languages like Bengali, Marathi, Tamil, etc.\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83c\udf0e Explore More<\/h2>\n\n\n\n<p>Visit the GitHub repo:&nbsp;<a href=\"https:\/\/github.com\/smtiitm\/Fastspeech2_HS\">https:\/\/github.com\/smtiitm\/Fastspeech2_HS<\/a>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Built and tested on RunPod. Works smoothly with CUDA 12.1 environment.\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>If you found this helpful, consider sharing it with other developers or language enthusiasts! \ud83d\ude80\n","protected":false},"excerpt":{"rendered":"<p>This step-by-step guide helps you set up and run a Hindi female Text-to-Speech system based on Fastspeech2_HS with ESPnet2 on a GPU-enabled server such as [&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":1588,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[214],"tags":[272,274,18],"class_list":["post-1576","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-software-installation","tag-hindi-tts","tag-runpod","tag-ubuntu"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/zahiralam.com\/blog\/wp-json\/wp\/v2\/posts\/1576","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zahiralam.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zahiralam.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zahiralam.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zahiralam.com\/blog\/wp-json\/wp\/v2\/comments?post=1576"}],"version-history":[{"count":1,"href":"https:\/\/zahiralam.com\/blog\/wp-json\/wp\/v2\/posts\/1576\/revisions"}],"predecessor-version":[{"id":1578,"href":"https:\/\/zahiralam.com\/blog\/wp-json\/wp\/v2\/posts\/1576\/revisions\/1578"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/zahiralam.com\/blog\/wp-json\/wp\/v2\/media\/1588"}],"wp:attachment":[{"href":"https:\/\/zahiralam.com\/blog\/wp-json\/wp\/v2\/media?parent=1576"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zahiralam.com\/blog\/wp-json\/wp\/v2\/categories?post=1576"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zahiralam.com\/blog\/wp-json\/wp\/v2\/tags?post=1576"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}