Creating talking images or videos is a fascinating application of deep learning. One such tool that makes this possible is Wav2Lip, a highly accurate lip-sync model. This article will guide you through the process of installing and using Wav2Lip on an Ubuntu system without a GPU.
Step 1: Setting Up the Environment
First, we need to set up a Python environment using Conda. For a detailed guide on installing Conda on Ubuntu, you can refer to step-by-step mentioned on below site :
Once installed, create a new Conda environment named ‘wav2lip’ with Python 3.6:
conda create -n wav2lip python=3.6 conda activate wav2lip
Step 2: Installing ffmpeg
Next, install ffmpeg, a software suite to handle multimedia data:
sudo apt-get install ffmpeg
Step 3: Cloning the Wav2Lip Repository
Clone the Wav2Lip repository from GitHub:
git clone https://github.com/Rudrabha/Wav2Lip.git cd Wav2Lip
Step 4: Modifying and Installing Requirements
Edit the requirements.txt
file and remove opencv-contrib-python
and opencv-python
. Then, install OpenCV from the Conda-Forge channel:
conda install -c conda-forge opencv
And after that install packages from requirements.txt
pip install -r requirements.txt
Step 5: Downloading Pre-Trained Models
Download the face detection pre-trained model and place it in the face_detection/detection/sfd/s3fd.pth
directory. You can download it from here.
Additionally, download the checkpoints for the Wav2Lip models. Here are the links to the models:
After Downloading checkpoints place it in checkpoints folder.
Step 6: Generating the Talking Image
Finally, you can generate the talking image using the following command:
python inference.py --checkpoint_path checkpoints/wav2lip_gan.pth --face input/zahir2.jpeg --audio input/bazigar_part1.wav --outfile results/pad-90-100-90-0-resize720.mp4 --pads 90 100 90 0 --resize_factor 720
Replace input/zahir2.jpeg
with the path to your image file and input/bazigar_part1.wav
with the path to your audio file.
And that’s it! You’ve now created a talking image using Wav2Lip on Ubuntu without a GPU. Enjoy bringing your images to life!
Conclusion
Using Wav2Lip, you can easily create talking images on Ubuntu without the need for a GPU. This powerful tool allows you to bring images to life by syncing lip movements to any audio file, making it an accessible solution for anyone interested in animation and deep learning.
If you’re looking to push the boundaries further and create 4K, long-duration talking videos with even more advanced capabilities, check out my guide on installing Hallo2 here. This tutorial will help you leverage the full potential of Hallo2 for creating high-quality, extended animations.