Skip to content

SkyworkAI/Matrix-3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

55 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Matrix-3D: Omnidirectional Explorable
3D World Generation

logo

๐Ÿ“„ Project Page Hugging Face Model Badge Technical report

๐ŸŒŸ Introduction

Matrix-3D utilizes panoramic representation for wide-coverage omnidirectional explorable 3D world generation that combines conditional video generation and panoramic 3D reconstruction.

  • Large-Scale Scene Generation : Compared to existing scene generation approaches, Matrix-3D supports the generation of broader, more expansive scenes that allow for complete 360-degree free exploration.
  • High Controllability : Matrix-3D supports both text and image inputs, with customizable trajectories and infinite extensibility.
  • Strong Generalization Capability : Built upon self-developed 3D data and video model priors, Matrix-3D enables the generation of diverse and high-quality 3D scenes.
  • Speed-Quality Balance: Two types of panoramic 3D reconstruction methods are proposed to achieve rapid and detailed 3D reconstruction respectively.

๐Ÿ—ž๏ธ News

  • Aug 29, 2025: ๐ŸŽ‰ We provide a gradio demo for Matrix-3D!
  • Aug 25, 2025: ๐ŸŽ‰ We provide a script for running the generation process with 19G VRAM!
  • Aug 12, 2025: ๐ŸŽ‰ We release the code, technical report and project page of Matrix-3D!

Image-to-Scene Generation

Image Panoramic Video 3D Scene


Image Panoramic Video 3D Scene


Text-to-Scene Generation

Text Panoramic Video 3D Scene
A floating island with a waterfall
Text Panoramic Video 3D Scene
an impressionistic winter landscape

Related Project: If you want to explore Real-Time Interactive Long-Sequence World Models, please visit Matrix-Game 2.0 for details.

๐Ÿ“ฆ Installation

Currently tested on Linux system with NVIDIA GPU.

Clone the repo and create the environment:

# Clone the repository 
git clone --recursive https://github.com/SkyworkAI/Matrix-3D.git
cd Matrix-3D

# Create a new conda environment
conda create -n matrix3d python=3.10
conda activate matrix3d

# Install torch and torchvision (with GPU support, we use CUDA 12.4 Version)
pip install torch==2.7.0 torchvision==0.22.0

#Run installation script
chmod +x install.sh
./install.sh

๐Ÿ’ซ Pretrained Models

Model Name Description Download
Text2PanoImage text2panoimage_lora.safetensors Link
PanoVideoGen-480p pano_video_gen_480p.ckpt Link
PanoVideoGen-720p pano_video_gen_720p.bin Link
PanoLRM-480p pano_lrm_480p.pt Link

Currently the generation process takes 40G VRAM for 480p panorama video and 60G VRAM for 720p panorama video normally. We also provide a script for running the generation process with 19G VRAM. Besides, we will soon release a smaller checkpoint which takes only 24G VRAM (e.g. NVIDIA RTX 4090 GPU) for 720p video generation.

๐ŸŽฎ Usage

  • ๐Ÿ”ง Checkpoint Download
python code/download_checkpoints.py
  • ๐Ÿ”ฅ One-command 3D World Generation

Now you can generate a 3D world by just running a single command:

./generate.sh

Or you can choose to generate a 3D world step by step.

  • ๐Ÿ–ผ๏ธ Step 1: Text/Image to Panorama Image

You can either generate a panorama image from text prompt:

python code/panoramic_image_generation.py \
    --mode=t2p \
    --prompt="a medieval village, half-timbered houses, cobblestone streets, lush greenery, clear blue sky, detailed textures, vibrant colors, high resolution" \
    --output_path="./output/example1"

Or from an input image:

python code/panoramic_image_generation.py \
    --mode=i2p \
    --input_image_path="./data/image1.jpg" \
    --output_path="./output/example1"

The generated panorama image will be saved in the output/example1 folder. If you want to use your own panorama image or panorama images generated by other methods like HunyuanWorld. You can organize the panorama image with its prompt as the following structure then proceed to step 2.

./output/example1
โ””โ”€ pano_img.jpg
โ””โ”€ prompt.txt
  • ๐Ÿ“น Step 2: Generate Panoramic Video
VISIBLE_GPU_NUM=1
torchrun --nproc_per_node ${VISIBLE_GPU_NUM} code/panoramic_image_to_video.py \
  --inout_dir="./output/example1"  \
  --resolution=720

You can switch the resolution option in [480,720] to perform video generation under 960 ร— 480 resolution or 1440 ร— 720 resolution. The generated panoramic tour video will be saved in output/example1/pano_video.mp4. It will take about an hour to generate a 720p video on an A800 GPU. You can accelerate this process with multi-gpu inference by setting VISIBLE_GPU_NUM.

Low VRAM mode To run the video generation step on devices with low VRAM, you can now enable VRAM management with a command line argument setting:

VISIBLE_GPU_NUM=1
torchrun --nproc_per_node ${VISIBLE_GPU_NUM} code/panoramic_image_to_video.py \
  --inout_dir="./output/example1"  \
  --resolution=720 \
  --enable_vram_management # enable this to allow model to run on devices with 19G vram.
  • ๐Ÿก Step 3: Extract 3D Scene

Here we provide two options, one is high-quality optimization-based 3D scene reconstruction and another is efficient feed-forward 3D scene reconstruction.

To perform optimization-based reconstruction, run

 python code/panoramic_video_to_3DScene.py \
    --inout_dir="./output/example1" \
    --resolution=720

Modify the resolution option as the value used in panoramic video generation. The extracted 3D scene in .ply format will be saved in output/example1/generated_3dgs_opt.ply.

To perform feed-forward reconstruction, run

python code/panoramic_video_480p_to_3DScene_lrm.py \
--video_path="./data/case1/sample_video.mp4" \
--pose_path='./data/case1/sample_cam.json' \
--out_path='./output/example2'

The extracted 3D scene in .ply format and rendered perspective videos will be saved output/example2. If you want to reconstruct 3D scene with another panorama video and conditioned camera pose, just replace the video_path and pose_path accordingly.

๐ŸŽฌ Create Your Own

Movement Mode Trajectory Panoramic Video 3D Scene
S-curve Travel
Forward on the Right

We provide three movement modes: Straight Travel, S-curve Travel, and Forward on the Right, which can be configured in --movement_mode in code/panoramic_image_to_video.py.

You can also provide your own camera trajectory in .json format and use it for video generation.

VISIBLE_GPU_NUM=1
torchrun --nproc_per_node ${VISIBLE_GPU_NUM} code/panoramic_image_to_video.py \
  --inout_dir="./output/example1"  \
  --resolution=720
  --json_path YOUR_TRAJECTORY_FILE.json

All camera matrices used in our project are world to camera matrices in opencv format. Please refer to the sample file ./data/test_cameras/test_cam_front.json, and use code/generate_example_camera.py to generate your own camera trajectory.

๐Ÿ–ฑ๏ธ Gradio Demo

We also provide a Gradio demo for better visualization. To launch the demo, run the following command in your terminal:

python code/matrix.py --max_gpus=1

Notes on GPU Configuration:

  • Single GPU (--max_gpus=1): Currently only supports text-video-3D generation workflow. Ensure your GPU has at least 62 GB of memory to run this mode smoothly.
  • Multiple GPUs (--max_gpus=N, Nโ‰ฅ2): Supports both Supports both text-video-3D and image-video-3D generation workflows. Allocate GPUs based on your hardware resources to optimize performance.

๐Ÿ“š Citation

If you find this project useful, please consider citing it as follows:

@article{yang2025matrix3d,
  title     = {Matrix-3D: Omnidirectional Explorable 3D World Generation},
  author    = {Zhongqi Yang and Wenhang Ge and Yuqi Li and Jiaqi Chen and Haoyuan Li and Mengyin An and Fei Kang and Hua Xue and Baixin Xu and Yuyang Yin and Eric Li and Yang Liu and Yikai Wang and Hao-Xiang Guo and Yahui Zhou},
  journal   = {arXiv preprint arXiv:2508.08086},
  year      = {2025}
}

๐Ÿค Acknowledgements

This project is built on top of the follows, please consider citing them if you find them useful:

๐Ÿ“ง Contact

If you have any questions or would like us to implement any features, please feel free post an issue.