███████╗██╗ ██╗██████╗ ██╗ ██╗██╗███████╗ ██╔════╝██║ ██║██╔══██╗ ██║ ██║██║╚══███╔╝ ███████╗██║ ██║██████╔╝ ██║ █╗ ██║██║ ███╔╝ ╚════██║██║ ██║██╔══██╗ ██║███╗██║██║ ███╔╝ ███████║╚██████╔╝██████╔╝ ╚███╔███╔╝██║███████╗ ╚══════╝ ╚═════╝ ╚═════╝ ╚══╝╚══╝ ╚═╝╚══════╝
A lightweight GPT model, trained to discover subdomains.
pipx install subwiz
OR
pip install subwiz
Use subfinder ❤️ to find subdomains from passive sources:
subfinder -d example.com -o subdomains.txt
Seed subwiz with these subdomains:
subwiz -i subdomains.txt
usage: cli.py [-h] -i INPUT_FILE [-o OUTPUT_FILE] [-n NUM_PREDICTIONS] [--no-resolve]
[--force-download] [--max-recursion MAX_RECURSION] [-t TEMPERATURE]
[-d {auto,cpu,cuda,mps}] [-q MAX_NEW_TOKENS]
[--resolution-concurrency RESOLUTION_CONCURRENCY] [--multi-apex]
options:
-h, --help show this help message and exit
-i INPUT_FILE, --input-file INPUT_FILE
file containing new-line-separated subdomains. (default: None)
-o OUTPUT_FILE, --output-file OUTPUT_FILE
output file to write new-line separated subdomains to. (default: None)
-n NUM_PREDICTIONS, --num-predictions NUM_PREDICTIONS
number of subdomains to predict. (default: 500)
--no-resolve do not resolve the output subdomains. (default: False)
--force-download download model and tokenizer files, even if cached. (default: False)
--max-recursion MAX_RECURSION
maximum number of times the inference process will recursively re-run
after finding new subdomains. (default: 5)
-t TEMPERATURE, --temperature TEMPERATURE
add randomness to the model (recommended ≤ 0.3). (default: 0.0)
-d {auto,cpu,cuda,mps}, --device {auto,cpu,cuda,mps}
hardware to run the transformer model on. (default: auto)
-q MAX_NEW_TOKENS, --max-new-tokens MAX_NEW_TOKENS
maximum length of predicted subdomains in tokens. (default: 10)
--resolution-concurrency RESOLUTION_CONCURRENCY
number of concurrent resolutions. (default: 128)
--multi-apex allow multiple apex domains in the input file. runs inference for each
apex separately. (default: False)
Use subwiz in Python, with the same parameters as the command line interface.
import subwiz
known_subdomains = ['test1.example.com', 'test2.example.com']
new_subdomains = subwiz.run(input_domains=known_subdomains)
Use the --no-resolve
flag to inspect model outputs without checking if they resolve.
Subwiz is a ultra-lightweight transformer model based on nanoGPT ❤️:
- 17.3M parameters.
- Trained on 26M tokens, lists of subdomains from passive sources.
- Tokenizer trained on same lists of subdomains (8192 tokens).
The model is saved in Hugging Face as HadrianSecurity/subwiz. It is downloaded when you first run subwiz.
Typically, generative transformer models (e.g. ChatGPT) predict a single output sequence. Subwiz predicts the N most likely sequences using a beam search algorithm.
Beam search algorithm to predict the N most likely sequences using a generative transformer model.