This is the official code of AvalonBench for paper AvalonBench: Evaluating LLMs Playing the Game of Avalon. Based on AgentBench, we support Multi-Agent play of The Resistance: Avalon, a popular board game that requires the ability of deductive reasoning, coordinate and collaborate, and skill of deception.
Note: Implemention based on AgentBench v0.2 can be found in here.
Here are the results of LLMs playing against baseline bots.
We also let LLMs playing against each other. Evil has an 8:2 advantage over Good, which is similar to the stats of rookie human players! Here are also some examples of discussion under this setting.
python
cd AgentBench
pip install -r requirements.txt
You need to fill your OPENAI API KEY in configs/agents/single_player.yaml
first. Please remember to fill in the keys for all 5 agents. Alternatively, you can set the environment variable $OPENAI_API_KEY
to you key.
To ensure that the code for the engine works, run the following from the root directory:
python -m unittest discover Avalon
First of all, also cd AgentBench
.
- Run single-player setting with LLM playing as Assassin (w/ discussion)
python eval.py \
--task configs/tasks/avalon/dev.yaml \
--agent configs/agents/single_player.yaml \
--config configs/avalon_experiment/assassin_discussion.yaml
- Run single-player setting with LLM playing as Servant (w/ discussion)
python eval.py \
--task configs/tasks/avalon/dev.yaml \
--agent configs/agents/single_player.yaml \
--config configs/avalon_experiment/servant_discussion.yaml
- Run multi-player setting (w/ discussion)
python eval.py \
--task configs/tasks/avalon/dev.yaml \
--agent configs/agents/all_llm.yaml \
--config configs/avalon_experiment/all_llm.yaml
You can customize your prompts and arguments in configs/avalon_experiment/*.yaml
You can import and use the game engine by running
from engine import AvalonGameEnvironment, AvalonConfig
First input your game configurations into AvalonConfig
, then create an AvalonGameEnvironment
based on that.
For an example of how to use the game engine, see Avalon/test_engine.py
@inproceedings{
light2023from,
title={From Text to Tactic: Evaluating {LLM}s Playing the Game of Avalon},
author={Jonathan Light and Min Cai and Sheng Shen and Ziniu Hu},
booktitle={NeurIPS 2023 Foundation Models for Decision Making Workshop},
year={2023},
url={https://openreview.net/forum?id=ltUrSryS0K}
}