dataset
is the CRS module that compiles and manages the vulnerable programs which will be analyzed by the CRS.
The supported test suites are:
- NIST's Juliet
- NIST's C Test Suite
- A toy dataset
- ELF format
- x86 architecture
The module does the following steps for each test suite that needs to be built:
-
Gets the available sources into the test suite's directory.
-
Preprocesses the sources for including all the required source code files and header files.
-
Writes the preprocessed sources into the
sources/
directory in the root of the repository. -
Creates a new entry into the
vulnerables.csv
of the dataset. -
Filters the source code files based on the wanted CWEs.
-
Compiles the preprocesses source files with the compile and link flags from multiple source code files (module files and user-provided files).
-
Writes the executables into the
executables/
directory in the root of the repository.
All build operations use GCC and are performed inside a 32-bit Ubuntu 18.04 container.
-
Make sure you have set up the repositories and Python environment according to the top-level instructions. That is:
-
Docker is installed and is properly running. Check using:
docker version docker ps -a docker run --rm hello-world
These commands should run without errors.
-
The current repository and the
commons
repository are cloned (with submodules) in the same directory. -
You are running all commands inside a Python virtual environment. There should be
(.venv)
prefix to your prompt. -
You have installed Poetry in the virtual environment. If you run:
which poetry
you should get a path ending with
.venv/bin/poetry
.
-
-
Disable the Python Keyring:
export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
This is an problem that may occur in certain situations, preventing Poetry from getting packages.
-
Install the required packages with Poetry (based on
pyprojects.toml
):poetry install --only main
-
Build the Docker image used to build the dataset assets:
docker build --platform linux/386 --tag ubuntu18.04_32bit_compiler -f docker/Dockerfile.ubuntu18.04_32bit_compiler .
You can use the dataset
module either standalone, as a CLI tool, or integrated into Python applications, as a Python module.
As a CLI tool, you can either use the cli.py
module:
python dataset/cli.py
or the Poetry interface:
poetry run dataset
$ poetry run dataset build --testsuite TOY_TEST_SUITE
β
Successfully built 5 executables.
$ poetry run dataset get
β
The available executables are:
ββββββββββββββββββββ³ββββββββββββββββββββββββββββββ³ββββββββββββββββββ³βββββββββββββββββββββββββββββββββββ
β ID β CWEs β Parent Database β Full Path β
β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β toy_test_suite_0 β Stack-based Buffer Overflow β toy_test_suite β executables/toy_test_suite_0.elf β
β toy_test_suite_1 β β toy_test_suite β executables/toy_test_suite_1.elf β
β toy_test_suite_2 β NULL Pointer Dereference β toy_test_suite β executables/toy_test_suite_2.elf β
β toy_test_suite_3 β NULL Pointer Dereference β toy_test_suite β executables/toy_test_suite_3.elf β
β toy_test_suite_4 β β toy_test_suite β executables/toy_test_suite_4.elf β
ββββββββββββββββββββ΄ββββββββββββββββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββ
$ poetry run dataset
Usage: dataset [OPTIONS] COMMAND [ARGS]...
Builds and filters datasets of vulnerable programs
Options:
--help Show this message and exit.
Commands:
build Builds a test suite.
show Gets the executables in the whole dataset.
from dataset import Dataset
available_executables = Dataset().get_available_executables()