Breathing Life into Images: Creating Talking Images on Ubuntu Without a GPU Using Wav2Lip

Creating talking images or videos is a fascinating application of deep learning. One such tool that makes this possible is Wav2Lip, a highly accurate lip-sync model. This article will guide you through the process of installing and using Wav2Lip on an Ubuntu system without a GPU.

Step 1: Setting Up the Environment

First, we need to set up a Python environment using Conda. For a detailed guide on installing Conda on Ubuntu, you can refer to step-by-step mentioned on below site :

Conda Installation on Ubuntu: Simplified Step-by-Step Instructions with Activation and Deactivation

Once installed, create a new Conda environment named ‘wav2lip’ with Python 3.6:

conda create -n wav2lip python=3.6
conda activate wav2lip

Step 2: Installing ffmpeg

Next, install ffmpeg, a software suite to handle multimedia data:

sudo apt-get install ffmpeg

Step 3: Cloning the Wav2Lip Repository

Clone the Wav2Lip repository from GitHub:

git clone https://github.com/Rudrabha/Wav2Lip.git
cd Wav2Lip

Step 4: Modifying and Installing Requirements

Edit the requirements.txt file and remove opencv-contrib-python and opencv-python. Then, install OpenCV from the Conda-Forge channel:

conda install -c conda-forge opencv

And after that install packages from requirements.txt

pip install -r requirements.txt

Step 5: Downloading Pre-Trained Models

Download the face detection pre-trained model and place it in the face_detection/detection/sfd/s3fd.pth directory. You can download it from here.

Additionally, download the checkpoints for the Wav2Lip models. Here are the links to the models:

After Downloading checkpoints place it in checkpoints folder.

Step 6: Generating the Talking Image

Finally, you can generate the talking image using the following command:

python inference.py --checkpoint_path checkpoints/wav2lip_gan.pth --face input/zahir2.jpeg --audio input/bazigar_part1.wav --outfile results/pad-90-100-90-0-resize720.mp4 --pads 90 100 90 0 --resize_factor 720

Replace input/zahir2.jpeg with the path to your image file and input/bazigar_part1.wav with the path to your audio file.

And that’s it! You’ve now created a talking image using Wav2Lip on Ubuntu without a GPU. Enjoy bringing your images to life!

Conclusion

Using Wav2Lip, you can easily create talking images on Ubuntu without the need for a GPU. This powerful tool allows you to bring images to life by syncing lip movements to any audio file, making it an accessible solution for anyone interested in animation and deep learning.

If you’re looking to push the boundaries further and create 4K, long-duration talking videos with even more advanced capabilities, check out my guide on installing Hallo2 here. This tutorial will help you leverage the full potential of Hallo2 for creating high-quality, extended animations.