Skip to content

open-crs/dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

46 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

dataset πŸ—‚οΈ



Description

dataset is the CRS module that compiles and manages the vulnerable programs which will be analyzed by the CRS.

The supported test suites are:

  • NIST's Juliet
  • NIST's C Test Suite
  • A toy dataset

Limitations

  • ELF format
  • x86 architecture

How It Works

The module does the following steps for each test suite that needs to be built:

  1. Gets the available sources into the test suite's directory.

  2. Preprocesses the sources for including all the required source code files and header files.

  3. Writes the preprocessed sources into the sources/ directory in the root of the repository.

  4. Creates a new entry into the vulnerables.csv of the dataset.

  5. Filters the source code files based on the wanted CWEs.

  6. Compiles the preprocesses source files with the compile and link flags from multiple source code files (module files and user-provided files).

  7. Writes the executables into the executables/ directory in the root of the repository.

All build operations use GCC and are performed inside a 32-bit Ubuntu 18.04 container.

Setup

  1. Make sure you have set up the repositories and Python environment according to the top-level instructions. That is:

    • Docker is installed and is properly running. Check using:

      docker version
      docker ps -a
      docker run --rm hello-world

      These commands should run without errors.

    • The current repository and the commons repository are cloned (with submodules) in the same directory.

    • You are running all commands inside a Python virtual environment. There should be (.venv) prefix to your prompt.

    • You have installed Poetry in the virtual environment. If you run:

      which poetry

      you should get a path ending with .venv/bin/poetry.

  2. Disable the Python Keyring:

    export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring

    This is an problem that may occur in certain situations, preventing Poetry from getting packages.

  3. Install the required packages with Poetry (based on pyprojects.toml):

    poetry install --only main
  4. Build the Docker image used to build the dataset assets:

    docker build --platform linux/386 --tag ubuntu18.04_32bit_compiler -f docker/Dockerfile.ubuntu18.04_32bit_compiler .

Usage

You can use the dataset module either standalone, as a CLI tool, or integrated into Python applications, as a Python module.

As a CLI Tool

As a CLI tool, you can either use the cli.py module:

python dataset/cli.py

or the Poetry interface:

poetry run dataset

Build Test Suite

$ poetry run dataset build --testsuite TOY_TEST_SUITE
βœ… Successfully built 5 executables.

List Executables

$ poetry run dataset get
βœ… The available executables are:

┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID               ┃ CWEs                        ┃ Parent Database ┃ Full Path                        ┃
┑━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ toy_test_suite_0 β”‚ Stack-based Buffer Overflow β”‚ toy_test_suite  β”‚ executables/toy_test_suite_0.elf β”‚
β”‚ toy_test_suite_1 β”‚                             β”‚ toy_test_suite  β”‚ executables/toy_test_suite_1.elf β”‚
β”‚ toy_test_suite_2 β”‚ NULL Pointer Dereference    β”‚ toy_test_suite  β”‚ executables/toy_test_suite_2.elf β”‚
β”‚ toy_test_suite_3 β”‚ NULL Pointer Dereference    β”‚ toy_test_suite  β”‚ executables/toy_test_suite_3.elf β”‚
β”‚ toy_test_suite_4 β”‚                             β”‚ toy_test_suite  β”‚ executables/toy_test_suite_4.elf β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Get Help

$ poetry run dataset
Usage: dataset [OPTIONS] COMMAND [ARGS]...

  Builds and filters datasets of vulnerable programs

Options:
  --help  Show this message and exit.

Commands:
  build  Builds a test suite.
  show   Gets the executables in the whole dataset.

As a Python Module

from dataset import Dataset

available_executables = Dataset().get_available_executables()

About

Module for compiling and managing vulnerable programs πŸ—‚οΈ

Topics

Resources

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •