Matrix-3D utilizes panoramic representation for wide-coverage omnidirectional explorable 3D world generation that combines conditional video generation and panoramic 3D reconstruction.
- Large-Scale Scene Generation : Compared to existing scene generation approaches, Matrix-3D supports the generation of broader, more expansive scenes that allow for complete 360-degree free exploration.
- High Controllability : Matrix-3D supports both text and image inputs, with customizable trajectories and infinite extensibility.
- Strong Generalization Capability : Built upon self-developed 3D data and video model priors, Matrix-3D enables the generation of diverse and high-quality 3D scenes.
- Speed-Quality Balance: Two types of panoramic 3D reconstruction methods are proposed to achieve rapid and detailed 3D reconstruction respectively.
- Aug 29, 2025: ๐ We provide a gradio demo for Matrix-3D!
- Aug 25, 2025: ๐ We provide a script for running the generation process with 19G VRAM!
- Aug 12, 2025: ๐ We release the code, technical report and project page of Matrix-3D!
Image | Panoramic Video | 3D Scene |
---|---|---|
Image | Panoramic Video | 3D Scene |
Text | Panoramic Video | 3D Scene |
---|---|---|
A floating island with a waterfall | ||
Text | Panoramic Video | 3D Scene |
an impressionistic winter landscape |
Related Project: If you want to explore Real-Time Interactive Long-Sequence World Models, please visit Matrix-Game 2.0 for details.
Currently tested on Linux system with NVIDIA GPU.
Clone the repo and create the environment:
# Clone the repository
git clone --recursive https://github.com/SkyworkAI/Matrix-3D.git
cd Matrix-3D
# Create a new conda environment
conda create -n matrix3d python=3.10
conda activate matrix3d
# Install torch and torchvision (with GPU support, we use CUDA 12.4 Version)
pip install torch==2.7.0 torchvision==0.22.0
#Run installation script
chmod +x install.sh
./install.sh
Model Name | Description | Download |
---|---|---|
Text2PanoImage | text2panoimage_lora.safetensors | Link |
PanoVideoGen-480p | pano_video_gen_480p.ckpt | Link |
PanoVideoGen-720p | pano_video_gen_720p.bin | Link |
PanoLRM-480p | pano_lrm_480p.pt | Link |
Currently the generation process takes 40G VRAM for 480p panorama video and 60G VRAM for 720p panorama video normally. We also provide a script for running the generation process with 19G VRAM. Besides, we will soon release a smaller checkpoint which takes only 24G VRAM (e.g. NVIDIA RTX 4090 GPU) for 720p video generation.
- ๐ง Checkpoint Download
python code/download_checkpoints.py
- ๐ฅ One-command 3D World Generation
Now you can generate a 3D world by just running a single command:
./generate.sh
Or you can choose to generate a 3D world step by step.
- ๐ผ๏ธ Step 1: Text/Image to Panorama Image
You can either generate a panorama image from text prompt:
python code/panoramic_image_generation.py \
--mode=t2p \
--prompt="a medieval village, half-timbered houses, cobblestone streets, lush greenery, clear blue sky, detailed textures, vibrant colors, high resolution" \
--output_path="./output/example1"
Or from an input image:
python code/panoramic_image_generation.py \
--mode=i2p \
--input_image_path="./data/image1.jpg" \
--output_path="./output/example1"
The generated panorama image will be saved in the output/example1
folder.
If you want to use your own panorama image or panorama images generated by other methods like HunyuanWorld. You can organize the panorama image with its prompt as the following structure then proceed to step 2.
./output/example1
โโ pano_img.jpg
โโ prompt.txt
- ๐น Step 2: Generate Panoramic Video
VISIBLE_GPU_NUM=1
torchrun --nproc_per_node ${VISIBLE_GPU_NUM} code/panoramic_image_to_video.py \
--inout_dir="./output/example1" \
--resolution=720
You can switch the resolution option in [480,720] to perform video generation under 960 ร 480 resolution or 1440 ร 720 resolution.
The generated panoramic tour video will be saved in output/example1/pano_video.mp4
. It will take about an hour to generate a 720p video on an A800 GPU. You can accelerate this process with multi-gpu inference by setting VISIBLE_GPU_NUM.
Low VRAM mode To run the video generation step on devices with low VRAM, you can now enable VRAM management with a command line argument setting:
VISIBLE_GPU_NUM=1
torchrun --nproc_per_node ${VISIBLE_GPU_NUM} code/panoramic_image_to_video.py \
--inout_dir="./output/example1" \
--resolution=720 \
--enable_vram_management # enable this to allow model to run on devices with 19G vram.
- ๐ก Step 3: Extract 3D Scene
Here we provide two options, one is high-quality optimization-based 3D scene reconstruction and another is efficient feed-forward 3D scene reconstruction.
To perform optimization-based reconstruction, run
python code/panoramic_video_to_3DScene.py \
--inout_dir="./output/example1" \
--resolution=720
Modify the resolution option as the value used in panoramic video generation.
The extracted 3D scene in .ply
format will be saved in output/example1/generated_3dgs_opt.ply
.
To perform feed-forward reconstruction, run
python code/panoramic_video_480p_to_3DScene_lrm.py \
--video_path="./data/case1/sample_video.mp4" \
--pose_path='./data/case1/sample_cam.json' \
--out_path='./output/example2'
The extracted 3D scene in .ply
format and rendered perspective videos will be saved output/example2
.
If you want to reconstruct 3D scene with another panorama video and conditioned camera pose, just replace the video_path and pose_path accordingly.
Movement Mode | Trajectory | Panoramic Video | 3D Scene |
---|---|---|---|
S-curve Travel | |||
Forward on the Right |
We provide three movement modes: Straight Travel
, S-curve Travel
, and Forward on the Right
, which can be configured in --movement_mode
in code/panoramic_image_to_video.py
.
You can also provide your own camera trajectory in .json format and use it for video generation.
VISIBLE_GPU_NUM=1
torchrun --nproc_per_node ${VISIBLE_GPU_NUM} code/panoramic_image_to_video.py \
--inout_dir="./output/example1" \
--resolution=720
--json_path YOUR_TRAJECTORY_FILE.json
All camera matrices used in our project are world to camera matrices in opencv format. Please refer to the sample file ./data/test_cameras/test_cam_front.json
, and use code/generate_example_camera.py
to generate your own camera trajectory.
We also provide a Gradio demo for better visualization. To launch the demo, run the following command in your terminal:
python code/matrix.py --max_gpus=1
Notes on GPU Configuration:
- Single GPU (--max_gpus=1): Currently only supports text-video-3D generation workflow. Ensure your GPU has at least 62 GB of memory to run this mode smoothly.
- Multiple GPUs (--max_gpus=N, Nโฅ2): Supports both Supports both text-video-3D and image-video-3D generation workflows. Allocate GPUs based on your hardware resources to optimize performance.
If you find this project useful, please consider citing it as follows:
@article{yang2025matrix3d,
title = {Matrix-3D: Omnidirectional Explorable 3D World Generation},
author = {Zhongqi Yang and Wenhang Ge and Yuqi Li and Jiaqi Chen and Haoyuan Li and Mengyin An and Fei Kang and Hua Xue and Baixin Xu and Yuyang Yin and Eric Li and Yang Liu and Yikai Wang and Hao-Xiang Guo and Yahui Zhou},
journal = {arXiv preprint arXiv:2508.08086},
year = {2025}
}
This project is built on top of the follows, please consider citing them if you find them useful:
If you have any questions or would like us to implement any features, please feel free post an issue.