Hallo2 is a powerful framework designed for creating long-duration, high-resolution, audio-driven portrait image animations. It stands out for its ability to generate 4K videos for animations lasting up to or beyond 1 hour, making it a go-to tool for extended animations with stunning detail. This tutorial will guide you step-by-step through the process of installing and running Hallo2 on Ubuntu, so you can leverage its full potential for your AI animation projects.
Prerequisites
Before starting, ensure that you have:
- Operating System: Ubuntu 20.04 or Ubuntu 22.04
- CUDA: CUDA 11.8 (for GPU acceleration)
- GPU: Tested with NVIDIA A100 (other CUDA-enabled GPUs may also work)
- Python: Python 3.10 or higher
1. Clone the Hallo2 Repository
First, clone the Hallo2 repository from GitHub to obtain the necessary code, scripts, and configuration files.
# Clone the Hallo2 repository
git clone https://github.com/fudan-generative-vision/hallo2.git
# Navigate into the project directory
cd hallo2
This step is essential as it provides all the scripts you will need to run Hallo2.
2. Install Git LFS
Git LFS (Large File Storage) is required to download large model files. Install it using the following commands:
# Install Git LFS
sudo apt-get install git-lfs
# Set up Git LFS
git lfs install
3. Install Conda
If you don’t have Conda installed, you can install Miniconda to create and manage the Python environment. Follow these steps:
# Download Miniconda installer
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# Run the installer
bash Miniconda3-latest-Linux-x86_64.sh
# Restart your terminal
After installing Conda, create a Conda environment for Hallo2 with Python 3.10:
conda create -n hallo python=3.10
conda activate hallo
4. Install Required Python Packages
With your Conda environment active, install the required Python packages. Start by installing PyTorch with CUDA 11.8 support:
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
Next, install the remaining dependencies by running:
pip install -r requirements.txt
Also, install FFmpeg, which is necessary for video processing:
sudo apt-get install ffmpeg
5. Download Pretrained Models
Hallo2 requires pretrained models, which you can download from HuggingFace. Run the following commands to clone the pretrained models:
git clone https://huggingface.co/fudan-generative-ai/hallo2 pretrained_models
Alternatively, you can download each pretrained model from their respective repositories. Make sure to organize them in the following directory structure:
./pretrained_models/
|-- audio_separator/
|-- CodeFormer/
|-- face_analysis/
|-- facelib
|-- hallo2
|-- motion_module/
|-- realesrgan
|-- sd-vae-ft-mse/
|-- stable-diffusion-v1-5/
|-- wav2vec/
6. Edit the Configuration File
The next step is to configure the paths in the YAML file. The default configuration file (configs/inference/long.yaml
) provides example paths for the source image and driving audio:
source_image: ./examples/reference_images/1.jpg
driving_audio: ./examples/driving_audios/1.wav
You can replace these with your own files by updating the source_image
and driving_audio
fields with the appropriate paths.
Make sure the paths for pretrained models are correct in the configuration, and check that the save_path
is where you want to store the results. The default is:
save_path: ./output_long/debug/
7. Running Inference
Now you are ready to run inference and generate animations based on your source image and driving audio.
Run the following command to start the animation generation process:
python scripts/inference_long.py --config ./configs/inference/long.yaml
This script will generate an animation based on the image and audio, and save the result to the location specified in the save_path
.
8. (Optional) High-Resolution Video Upsampling
To further enhance the quality of the video, you can run the high-resolution upsampling script. This improves both the background and face areas in the video:
python scripts/video_sr.py --input_path [input_video] --output_path [output_dir] --bg_upsampler realesrgan --face_upsample -w 1 -s 4
Replace [input_video]
with the path to the generated video and specify the output directory in [output_dir]
. This command will enhance the video and save the result in the specified folder.
9. Troubleshooting
Here are some common issues and their solutions:
- CUDA not found: Make sure CUDA 11.8 is installed correctly by running
nvcc --version
. If not, follow a CUDA installation guide. - Pretrained model not found: Verify that all pretrained models are placed in the correct folders under
pretrained_models
. - Slow inference: Ensure that you are using a CUDA-enabled GPU for faster inference. Running on a CPU will significantly slow down the process.
10. Hardware Considerations
To get the best performance out of Hallo2, ensure that you are using:
- GPU: A CUDA-compatible GPU is recommended for faster inference. The repository has been tested with NVIDIA A100, but other GPUs should work if they support CUDA.
- Memory: Depending on the model size and image resolution, your GPU should have at least 16GB of memory to run the models smoothly.
11. Viewing Results
Once inference is complete, the generated animation will be saved in the directory you specified in the configuration file. You can open and view the resulting video using a media player like VLC.
Conclusion
By following this tutorial, you should now be able to install, configure, and run Hallo2 on your Ubuntu system. With the pretrained models and scripts in place, you can generate high-quality animations using your own images and audio. For further customization, feel free to explore the configuration file and experiment with different settings.