GitHub

This codebase accompanies the paper Evaluating the Goal-Directedness of Large Language Models.

We provide below instructions on running the code.

Setting up the environment

Create the pre-defined conda environment (llm_goals.yml) by running

conda env create -f llm_goals_env.yml

To update the conda environment with additional packages, first add new packages in llm_goals.yml file, then run:

conda env update --file llm_goals.yml --prune --name llm_goals

To activate the environment run:

conda activate llm_goals

To deactivate the environment run:

conda deactivate

Setting up API keys for querying LLM models

Add the API keys for querying LLM models to the .env file:

GOOGLE_API_KEY = "your-api-key-string-here"
OPENAI_API_KEY = "your-api-key-string-here"
ANTHROPIC_API_KEY = "your-api-key-string-here"

TEST ENVIRONMENTS

We make available a custom implementation of the BlocksWorld environment. Please see blocksworld_environment folder.

TASKS

We evaluate LLM goal directedness on four tasks:

Information gathering
Cognitive effort
Plan and execute
Combined task

Please see tasks folder for their implementation.

Running the code

To run the code for one task such as information gathering, use the command below:

python3 main.py --task information_gathering --model gemini-2.0-flash --num_blocks 3

To run the code for multiple tasks, please see run_all_tasks.sh in the scripts folder.

The command below will launch the evals for all tasks.

./scripts/run_all_tasks.sh

Analysis of the results

Please see analysis folder for detailed instructions.

To cite this paper, please use the the bib entry below (TO ADD):

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
agents		agents
analysis		analysis
blocksworld_environment		blocksworld_environment
scripts		scripts
tasks		tasks
tests		tests
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
llm_goals.yml		llm_goals.yml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Setting up the environment

Setting up API keys for querying LLM models

TEST ENVIRONMENTS

TASKS

Running the code

Analysis of the results

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Crista23/goal_directedness_llms

Folders and files

Latest commit

History

Repository files navigation

Setting up the environment

Setting up API keys for querying LLM models

TEST ENVIRONMENTS

TASKS

Running the code

Analysis of the results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages