SimWorld

This project contains data sets and code for SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model and is designed to support researchers in related fields. This paper presents a simulator-conditioned scene generation engine based on world models and providing a benchmark for world model evaluation.

Project content

1. Dataset

The data set provided in this project contains the following:

Name: SimWorld
Description: This dataset includes 4k real mine data and 11.6k simulation images (including groundtruth, segmentation mask, 2D detection label, natural language description).
File format: The groundtruth is jpg image, the segmentation mask is png image, the detection label is yolo format txt text, and the natural language description is txt text.
Structure: The dataset contains the following main files/folders:

dataset/
│
├── AutoMine/                    # real data
│   ├── target/                  # groundtruth
│   ├── source/                  # segmentation mask
│   ├── bounding_boxes/          # 2D detection label
│   └── prompts/                 # natural language description
├── PMScenes/                    # synthetic data
│   ├── target/                  # groundtruth
│   ├── source/                  # segmentation mask
│   ├── bounding_boxes/          # 2D detection label
│   └── prompts/                 # natural language description

You can download the dataset from the following Google Drive link: https://drive.google.com/drive/folders/1JdyGvFU4KkpqHht8VVe_fsMOn-UnMj3S?usp=drive_link

2. Train

This repository contains two multi-card distributed training scripts, applicable to SimWorld XL and SimWorld fine-tuning, respectively.

Environment Dependencies It is recommended to install the dependency packages in requirements_train.txt under a Python 3.12.2 environment.

Installation

conda create -n diffusion python=3.12.2
conda activate diffusion

pip install -r requirements_train.txt

Data Preparation Please prepare the training data in the format of a dataset.json file. Each line should represent a single sample and contain the following fields:

{
  "source": "path/to/conditional_image.png",
  "target": "path/to/target_image.png",
  "prompts": "text description",
  "labels": [
    ["class", "x", "y", "w", "h"], ["..."]
  ]
}

source: The conditional control image (e.g., segmentation map, depth map, etc.).
target: The target image.
prompts: The text prompt.
labels: Object detection bounding boxes (in YOLO format, float type, with normalized center coordinates).

Train Before training, please download the corresponding versions of the Stable Diffusion (SD) and ControlNet base models. After preparing these models, you can run the training script.

You can download the checkpoint from the following Google Drive link: https://drive.google.com/drive/folders/1ZKDijJ98-vX6IlfAShT7JFs0it1l6LpC?usp=drive_link

Training Script 1: train_SimWorld.py

python train_SimWorld.py \
  --sd_path "/path/to/stable-diffusion-v1-5" \
  --cn_path "/path/to/control_v11p_sd15_seg" \
  --dataset_json "/path/to/dataset.json" \
  --pretrained_controlnet "/path/to/pretrained.model" \
  --cuda_devices "0,1,2,3" \

Training Script 2: train_SimWorldXL.py

python train_SimWorldXL.py \
  --sd_path "/path/to/stable-diffusion-xl-base-1.0" \
  --cn_path "/path/to/controlnet-depth-sdxl-1.0" \
  --vae_path "/path/to/sdxl-vae-fp16-fix" \
  --dataset_json "/path/to/dataset.json" \
  --pretrained_controlnet "/path/to/pretrained_epoch.model" \
  --cuda_devices "0,1,2,3" \

3.Inference

This script performs inference on a given conditional input image and a text prompt, using the pre-trained SimWorld models.

Installation It is recommended to install the dependency packages in requirements_inference.txt under a Python 3.12.2 environment.

pip install -r requirements_inference.txt

Run Inference

Prepare the following:
- Stable Diffusion model directory (--sd_path), containing tokenizer, text_encoder, unet, scheduler, vae subfolders.
- ControlNet model directory (--cn_path).
- ControlNet trained weights checkpoint (--controlnet_ckpt).
- Conditional input image (e.g., segmentation mask).
- Text prompt file (plain text).
Run the inference script with the appropriate arguments:

python inference_SimWorld.py \
  --sd_path /path/to/stable-diffusion-v1-5 \
  --cn_path /path/to/controlnet-depth-sdxl-1.0 \
  --controlnet_ckpt /path/to/pretrained_epoch.model \
  --source_image /path/to/conditional_image.png \
  --target_prompt /path/to/prompt.txt \
  --cuda_device 0 \
  --time_steps 50 \
  --output_image output.png

or

python inference_SimWorldXL.py \
  --sd_path /path/to/stable-diffusion-xl-base-1.0 \
  --controlnet_path /path/to/controlnet-depth-sdxl-1.0 \
  --vae_path /path/to/sdxl-vae-fp16-fix \
  --controlnet_ckpt /path/to/pretrained_epoch.model \
  --input_image /path/to/conditional_image.png \
  --prompt_file /path/to/prompt.txt \
  --cuda_device 3 \
  --output_image ./outputs.png

4.Cite

If this project is helpful for your work, please cite the following paper:

@article{li2025simworld,
  title={Simworld: A unified benchmark for simulator-conditioned scene generation via world model},
  author={Li, Xinqing and Song, Ruiqi and Xie, Qingyu and Wu, Ye and Zeng, Nanxin and Ai, Yunfeng},
  journal={arXiv preprint arXiv:2503.13952},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SimWorld

Project content

1. Dataset

2. Train

3.Inference

4.Cite

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
inference		inference
train		train
README.md		README.md
requirements_inference.txt		requirements_inference.txt
requirements_train.txt		requirements_train.txt

Li-Zn-H/SimWorld

Folders and files

Latest commit

History

Repository files navigation

SimWorld

Project content

1. Dataset

2. Train

3.Inference

4.Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages