This codebase accompanies the paper Evaluating the Goal-Directedness of Large Language Models.
We provide below instructions on running the code.
Create the pre-defined conda environment (llm_goals.yml) by running
conda env create -f llm_goals_env.yml
To update the conda environment with additional packages, first add new packages in llm_goals.yml
file, then run:
conda env update --file llm_goals.yml --prune --name llm_goals
To activate the environment run:
conda activate llm_goals
To deactivate the environment run:
conda deactivate
Add the API keys for querying LLM models to the .env
file:
GOOGLE_API_KEY = "your-api-key-string-here"
OPENAI_API_KEY = "your-api-key-string-here"
ANTHROPIC_API_KEY = "your-api-key-string-here"
We make available a custom implementation of the BlocksWorld environment.
Please see blocksworld_environment
folder.
We evaluate LLM goal directedness on four tasks:
- Information gathering
- Cognitive effort
- Plan and execute
- Combined task
Please see tasks
folder for their implementation.
To run the code for one task such as information gathering, use the command below:
python3 main.py --task information_gathering --model gemini-2.0-flash --num_blocks 3
To run the code for multiple tasks, please see run_all_tasks.sh
in the scripts
folder.
The command below will launch the evals for all tasks.
./scripts/run_all_tasks.sh
Please see analysis
folder for detailed instructions.
To cite this paper, please use the the bib entry below (TO ADD):