Matrix-3D: Omnidirectional Explorable
3D World Generation

🌟 Introduction

Matrix-3D utilizes panoramic representation for wide-coverage omnidirectional explorable 3D world generation that combines conditional video generation and panoramic 3D reconstruction.

Large-Scale Scene Generation : Compared to existing scene generation approaches, Matrix-3D supports the generation of broader, more expansive scenes that allow for complete 360-degree free exploration.
High Controllability : Matrix-3D supports both text and image inputs, with customizable trajectories and infinite extensibility.
Strong Generalization Capability : Built upon self-developed 3D data and video model priors, Matrix-3D enables the generation of diverse and high-quality 3D scenes.
Speed-Quality Balance: Two types of panoramic 3D reconstruction methods are proposed to achieve rapid and detailed 3D reconstruction respectively.

🗞️ News

Aug 29, 2025: 🎉 We provide a gradio demo for Matrix-3D!
Aug 25, 2025: 🎉 We provide a script for running the generation process with 19G VRAM!
Aug 12, 2025: 🎉 We release the code, technical report and project page of Matrix-3D!

Image-to-Scene Generation

Image	Panoramic Video	3D Scene

Image	Panoramic Video	3D Scene

Text-to-Scene Generation

Text	Panoramic Video	3D Scene
A floating island with a waterfall
Text	Panoramic Video	3D Scene
an impressionistic winter landscape

Related Project: If you want to explore Real-Time Interactive Long-Sequence World Models, please visit Matrix-Game 2.0 for details.

📦 Installation

Currently tested on Linux system with NVIDIA GPU.

Clone the repo and create the environment:

# Clone the repository 
git clone --recursive https://github.com/SkyworkAI/Matrix-3D.git
cd Matrix-3D

# Create a new conda environment
conda create -n matrix3d python=3.10
conda activate matrix3d

# Install torch and torchvision (with GPU support, we use CUDA 12.4 Version)
pip install torch==2.7.0 torchvision==0.22.0

#Run installation script
chmod +x install.sh
./install.sh

💫 Pretrained Models

Model Name	Description	Download
Text2PanoImage	text2panoimage_lora.safetensors	Link
PanoVideoGen-480p	pano_video_gen_480p.ckpt	Link
PanoVideoGen-720p	pano_video_gen_720p.bin	Link
PanoLRM-480p	pano_lrm_480p.pt	Link

Currently the generation process takes 40G VRAM for 480p panorama video and 60G VRAM for 720p panorama video normally. We also provide a script for running the generation process with 19G VRAM. Besides, we will soon release a smaller checkpoint which takes only 24G VRAM (e.g. NVIDIA RTX 4090 GPU) for 720p video generation.

🎮 Usage

🔧 Checkpoint Download

python code/download_checkpoints.py

🔥 One-command 3D World Generation

Now you can generate a 3D world by just running a single command:

./generate.sh

Or you can choose to generate a 3D world step by step.

🖼️ Step 1: Text/Image to Panorama Image

You can either generate a panorama image from text prompt:

python code/panoramic_image_generation.py \
    --mode=t2p \
    --prompt="a medieval village, half-timbered houses, cobblestone streets, lush greenery, clear blue sky, detailed textures, vibrant colors, high resolution" \
    --output_path="./output/example1"

Or from an input image:

python code/panoramic_image_generation.py \
    --mode=i2p \
    --input_image_path="./data/image1.jpg" \
    --output_path="./output/example1"

The generated panorama image will be saved in the output/example1 folder. If you want to use your own panorama image or panorama images generated by other methods like HunyuanWorld. You can organize the panorama image with its prompt as the following structure then proceed to step 2.

./output/example1
└─ pano_img.jpg
└─ prompt.txt

📹 Step 2: Generate Panoramic Video

VISIBLE_GPU_NUM=1
torchrun --nproc_per_node ${VISIBLE_GPU_NUM} code/panoramic_image_to_video.py \
  --inout_dir="./output/example1"  \
  --resolution=720

You can switch the resolution option in [480,720] to perform video generation under 960 × 480 resolution or 1440 × 720 resolution. The generated panoramic tour video will be saved in output/example1/pano_video.mp4. It will take about an hour to generate a 720p video on an A800 GPU. You can accelerate this process with multi-gpu inference by setting VISIBLE_GPU_NUM.

Low VRAM mode To run the video generation step on devices with low VRAM, you can now enable VRAM management with a command line argument setting:

VISIBLE_GPU_NUM=1
torchrun --nproc_per_node ${VISIBLE_GPU_NUM} code/panoramic_image_to_video.py \
  --inout_dir="./output/example1"  \
  --resolution=720 \
  --enable_vram_management # enable this to allow model to run on devices with 19G vram.

🏡 Step 3: Extract 3D Scene

Here we provide two options, one is high-quality optimization-based 3D scene reconstruction and another is efficient feed-forward 3D scene reconstruction.

To perform optimization-based reconstruction, run

 python code/panoramic_video_to_3DScene.py \
    --inout_dir="./output/example1" \
    --resolution=720

Modify the resolution option as the value used in panoramic video generation. The extracted 3D scene in .ply format will be saved in output/example1/generated_3dgs_opt.ply.

To perform feed-forward reconstruction, run

python code/panoramic_video_480p_to_3DScene_lrm.py \
--video_path="./data/case1/sample_video.mp4" \
--pose_path='./data/case1/sample_cam.json' \
--out_path='./output/example2'

The extracted 3D scene in .ply format and rendered perspective videos will be saved output/example2. If you want to reconstruct 3D scene with another panorama video and conditioned camera pose, just replace the video_path and pose_path accordingly.

🎬 Create Your Own

Movement Mode	Trajectory	Panoramic Video	3D Scene
S-curve Travel
Forward on the Right

We provide three movement modes: Straight Travel, S-curve Travel, and Forward on the Right, which can be configured in --movement_mode in code/panoramic_image_to_video.py.

You can also provide your own camera trajectory in .json format and use it for video generation.

VISIBLE_GPU_NUM=1
torchrun --nproc_per_node ${VISIBLE_GPU_NUM} code/panoramic_image_to_video.py \
  --inout_dir="./output/example1"  \
  --resolution=720
  --json_path YOUR_TRAJECTORY_FILE.json

All camera matrices used in our project are world to camera matrices in opencv format. Please refer to the sample file ./data/test_cameras/test_cam_front.json, and use code/generate_example_camera.py to generate your own camera trajectory.

🖱️ Gradio Demo

We also provide a Gradio demo for better visualization. To launch the demo, run the following command in your terminal:

python code/matrix.py --max_gpus=1

Notes on GPU Configuration:

Single GPU (--max_gpus=1): Currently only supports text-video-3D generation workflow. Ensure your GPU has at least 62 GB of memory to run this mode smoothly.
Multiple GPUs (--max_gpus=N, N≥2): Supports both Supports both text-video-3D and image-video-3D generation workflows. Allocate GPUs based on your hardware resources to optimize performance.

📚 Citation

If you find this project useful, please consider citing it as follows:

@article{yang2025matrix3d,
  title     = {Matrix-3D: Omnidirectional Explorable 3D World Generation},
  author    = {Zhongqi Yang and Wenhang Ge and Yuqi Li and Jiaqi Chen and Haoyuan Li and Mengyin An and Fei Kang and Hua Xue and Baixin Xu and Yuyang Yin and Eric Li and Yang Liu and Yikai Wang and Hao-Xiang Guo and Yahui Zhou},
  journal   = {arXiv preprint arXiv:2508.08086},
  year      = {2025}
}

🤝 Acknowledgements

This project is built on top of the follows, please consider citing them if you find them useful:

📧 Contact

If you have any questions or would like us to implement any features, please feel free post an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
asset		asset
code		code
data		data
submodules		submodules
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
generate.sh		generate.sh
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Matrix-3D: Omnidirectional Explorable
3D World Generation

🌟 Introduction

🗞️ News

Image-to-Scene Generation

Text-to-Scene Generation

📦 Installation

💫 Pretrained Models

🎮 Usage

🎬 Create Your Own

🖱️ Gradio Demo

📚 Citation

🤝 Acknowledgements

📧 Contact

About

Uh oh!

Releases

Packages

Contributors 5

Languages

License

SkyworkAI/Matrix-3D

Folders and files

Latest commit

History

Repository files navigation

Matrix-3D: Omnidirectional Explorable 3D World Generation

🌟 Introduction

🗞️ News

Image-to-Scene Generation

Text-to-Scene Generation

📦 Installation

💫 Pretrained Models

🎮 Usage

🎬 Create Your Own

🖱️ Gradio Demo

📚 Citation

🤝 Acknowledgements

📧 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Matrix-3D: Omnidirectional Explorable
3D World Generation

Packages