-
Notifications
You must be signed in to change notification settings - Fork 81
Closed
Description
Hi there!
I've found that rolling out ground truth trajectories (labelled by the language annotator) from the dataset is not always evaluated to be successful by the Tasks.get_task_info. This seems to be quite concerning. Perhaps I've done something wrong on my end?
To reproduce, I have forked the repo with minimal changes here: #33
The only difference is in line 47 in calvin_modesl/calvin_agent/evaluation/evaluate_policy_singlestep.py
, where instead of rolling out the model I roll out the dataset actions.
The exact commands I ran from beginning to end:
# set up environment
git clone git@github.com:ezhang7423/calvin.git --recursive
cd calvin
conda create --name calvin python=3.8
conda activate calvin
pip install setuptools==57.5.0 torchmetrics==0.6.0
./install.sh
# get pretrained weights and fix the config.yaml
cp ./D_D_static_rgb_baseline/.hydra/config.yaml ./tmp.yaml
wget http://calvin.cs.uni-freiburg.de/model_weights/D_D_static_rgb_baseline.zip
unzip D_D_static_rgb_baseline.zip
unzip D_D_static_rgb_baseline.zip
mv ./tmp.yaml ./D_D_static_rgb_baseline/.hydra/config.yaml
# get data
cd dataset
./download_data.sh D
cd ../
# run the evaluation
python calvin_models/calvin_agent/evaluation/evaluate_policy_singlestep.py --dataset_path $DATA_GRAND_CENTRAL/task_D_D/ --train_folder ./D_D_static_rgb_baseline/ --checkpoint D_D_static_rgb_baseline/mcil_baseline.ckpt
clemgris
Metadata
Metadata
Assignees
Labels
No labels