TabArena - Tabular IID Dataset Curation Repository

This repository contains the code and metadata from the curation efforts for TabArena-v0.1. In detail, the curation efforts was focused on IID tabular data.

Note: this repository is subject to future change. Specifically it will be restructured to also, e.g., contain non-IID data.

Contributing Data - New Dataset or Feedback

If you want to contribute a new dataset or provide feedback on the currently included dataset, please open an issue! They issue templates will further guide you. We look forward to work with you to improve the curation of datasets or integrate new datasets into TabArena.

Repository Overview

This repository contains the following directories:

dataset_collection_scripts: Scripts to collect datasets from various sources and check for duplicates.
dataset_creation_scripts: Scripts to create datasets, their tasks, and aggregate their metadata.
dataset_insight_scripts: Scripts to obtain insights about our curated datasets collection.

Install

Assuming you created a virtual environment, you can install the package using:

pip install uv
uv pip install -r requirements.txt

Currently, we are using the pyproject.toml only for the ruff configuration.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
dataset_collection_scripts		dataset_collection_scripts
dataset_creation_scripts		dataset_creation_scripts
dataset_insight_scripts		dataset_insight_scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TabArena - Tabular IID Dataset Curation Repository

Contributing Data - New Dataset or Feedback

Repository Overview

Install

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

TabArena/tabarena_dataset_curation

Folders and files

Latest commit

History

Repository files navigation

TabArena - Tabular IID Dataset Curation Repository

Contributing Data - New Dataset or Feedback

Repository Overview

Install

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages