GitHub - hadriansecurity/subwiz: A lightweight GPT model, trained to discover subdomains.

███████╗██╗   ██╗██████╗     ██╗    ██╗██╗███████╗
██╔════╝██║   ██║██╔══██╗    ██║    ██║██║╚══███╔╝
███████╗██║   ██║██████╔╝    ██║ █╗ ██║██║  ███╔╝ 
╚════██║██║   ██║██╔══██╗    ██║███╗██║██║ ███╔╝  
███████║╚██████╔╝██████╔╝    ╚███╔███╔╝██║███████╗
╚══════╝ ╚═════╝ ╚═════╝      ╚══╝╚══╝ ╚═╝╚══════╝

A lightweight GPT model, trained to discover subdomains.

Installation

pipx install subwiz

OR

pip install subwiz

Recommended Use

Use subfinder ❤️ to find subdomains from passive sources:

subfinder -d example.com -o subdomains.txt

Seed subwiz with these subdomains:

subwiz -i subdomains.txt

Supported Switches

usage: cli.py [-h] -i INPUT_FILE [-o OUTPUT_FILE] [-n NUM_PREDICTIONS] [--no-resolve]
              [--force-download] [--max-recursion MAX_RECURSION] [-t TEMPERATURE]
              [-d {auto,cpu,cuda,mps}] [-q MAX_NEW_TOKENS]
              [--resolution-concurrency RESOLUTION_CONCURRENCY] [--multi-apex]

options:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input-file INPUT_FILE
                        file containing new-line-separated subdomains. (default: None)
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        output file to write new-line separated subdomains to. (default: None)
  -n NUM_PREDICTIONS, --num-predictions NUM_PREDICTIONS
                        number of subdomains to predict. (default: 500)
  --no-resolve          do not resolve the output subdomains. (default: False)
  --force-download      download model and tokenizer files, even if cached. (default: False)
  --max-recursion MAX_RECURSION
                        maximum number of times the inference process will recursively re-run
                        after finding new subdomains. (default: 5)
  -t TEMPERATURE, --temperature TEMPERATURE
                        add randomness to the model (recommended ≤ 0.3). (default: 0.0)
  -d {auto,cpu,cuda,mps}, --device {auto,cpu,cuda,mps}
                        hardware to run the transformer model on. (default: auto)
  -q MAX_NEW_TOKENS, --max-new-tokens MAX_NEW_TOKENS
                        maximum length of predicted subdomains in tokens. (default: 10)
  --resolution-concurrency RESOLUTION_CONCURRENCY
                        number of concurrent resolutions. (default: 128)
  --multi-apex          allow multiple apex domains in the input file. runs inference for each
                        apex separately. (default: False)

In Python

Use subwiz in Python, with the same parameters as the command line interface.

import subwiz

known_subdomains = ['test1.example.com', 'test2.example.com']
new_subdomains = subwiz.run(input_domains=known_subdomains)

Model

Use the --no-resolve flag to inspect model outputs without checking if they resolve.

Architecture

Subwiz is a ultra-lightweight transformer model based on nanoGPT ❤️:

17.3M parameters.
Trained on 26M tokens, lists of subdomains from passive sources.
Tokenizer trained on same lists of subdomains (8192 tokens).

Hugging Face

The model is saved in Hugging Face as HadrianSecurity/subwiz. It is downloaded when you first run subwiz.

Inference

Typically, generative transformer models (e.g. ChatGPT) predict a single output sequence. Subwiz predicts the N most likely sequences using a beam search algorithm.

Beam search algorithm to predict the N most likely sequences using a generative transformer model.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.github/workflows		.github/workflows
subwiz		subwiz
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
example_input.txt		example_input.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
subwiz.py		subwiz.py
subwiz_inference.png		subwiz_inference.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installation

Recommended Use

Supported Switches

In Python

Model

Architecture

Hugging Face

Inference

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors 4

Languages

License

hadriansecurity/subwiz

Folders and files

Latest commit

History

Repository files navigation

Installation

Recommended Use

Supported Switches

In Python

Model

Architecture

Hugging Face

Inference

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors 4

Languages

Packages