AvalonBench: Evaluating LLMs Playing the Game of Avalon

This is the official code of AvalonBench for paper AvalonBench: Evaluating LLMs Playing the Game of Avalon. Based on AgentBench, we support Multi-Agent play of The Resistance: Avalon, a popular board game that requires the ability of deductive reasoning, coordinate and collaborate, and skill of deception.

Note: Implemention based on AgentBench v0.2 can be found in here.

Initial Results

LLMs Play Against Baseline Bots

Here are the results of LLMs playing against baseline bots.

Multi-LLMs Self-Play

We also let LLMs playing against each other. Evil has an 8:2 advantage over Good, which is similar to the stats of rookie human players! Here are also some examples of discussion under this setting.

Getting Started

Prerequisites

python $\ge$ 3.10

Installing

cd AgentBench
pip install -r requirements.txt

OpenAI API Key

You need to fill your OPENAI API KEY in configs/agents/single_player.yaml first. Please remember to fill in the keys for all 5 agents. Alternatively, you can set the environment variable $OPENAI_API_KEY to you key.

Unit tests

To ensure that the code for the engine works, run the following from the root directory: python -m unittest discover Avalon

Running the experiments

First of all, also cd AgentBench.

Run single-player setting with LLM playing as Assassin (w/ discussion)

python eval.py \
    --task configs/tasks/avalon/dev.yaml \
    --agent configs/agents/single_player.yaml \
    --config configs/avalon_experiment/assassin_discussion.yaml

Run single-player setting with LLM playing as Servant (w/ discussion)

python eval.py \
    --task configs/tasks/avalon/dev.yaml \
    --agent configs/agents/single_player.yaml \
    --config configs/avalon_experiment/servant_discussion.yaml

Run multi-player setting (w/ discussion)

python eval.py \
    --task configs/tasks/avalon/dev.yaml \
    --agent configs/agents/all_llm.yaml \
    --config configs/avalon_experiment/all_llm.yaml

Configuration

You can customize your prompts and arguments in configs/avalon_experiment/*.yaml

Using the game engine

You can import and use the game engine by running

from engine import AvalonGameEnvironment, AvalonConfig

First input your game configurations into AvalonConfig, then create an AvalonGameEnvironment based on that.

For an example of how to use the game engine, see Avalon/test_engine.py

Citation

@inproceedings{
      light2023from,
      title={From Text to Tactic: Evaluating {LLM}s Playing the Game of Avalon},
      author={Jonathan Light and Min Cai and Sheng Shen and Ziniu Hu},
      booktitle={NeurIPS 2023 Foundation Models for Decision Making Workshop},
      year={2023},
      url={https://openreview.net/forum?id=ltUrSryS0K}
  }

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
AgentBench		AgentBench
Avalon		Avalon
GOPS		GOPS
assets		assets
.gitignore		.gitignore
README.md		README.md
Related_Works.md		Related_Works.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AvalonBench: Evaluating LLMs Playing the Game of Avalon

Initial Results

LLMs Play Against Baseline Bots

Multi-LLMs Self-Play

Getting Started

Prerequisites

Installing

OpenAI API Key

Unit tests

Running the experiments

Configuration

Using the game engine

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

jonathanmli/Avalon-LLM

Folders and files

Latest commit

History

Repository files navigation

AvalonBench: Evaluating LLMs Playing the Game of Avalon

Initial Results

LLMs Play Against Baseline Bots

Multi-LLMs Self-Play

Getting Started

Prerequisites

Installing

OpenAI API Key

Unit tests

Running the experiments

Configuration

Using the game engine

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Packages