Note
Project has been moved to its own repository from 4ngel2769/side-projects
A versatile Python script that recursively downloads specific file types from web directories with Apache-style listings. Perfect for mirroring repositories, downloading software distributions, or archiving web content.
- Recursive directory traversal with configurable depth
- Multiple file extensions support (
e.g., .torrent, .exe, .iso
) - Breadth-first search algorithm for efficient traversal
- Duplicate avoidance with visited URL tracking
- Resume capability by skipping existing files
- Smart Filtering
- Skips navigation links (
../, ./, #, ?
) - Ignores non-web links (
mailto:, tel:, javascript:
) - Validates file extensions before downloading
- Skips navigation links (
- Configurable delays between requests
- Cross-platform compatibility (
Windows, Linux, macOS
)
- Python 3.7 or higher
- Git
- Clone the repository:
git clone https://github.com/4ngel2769/rwdl.git
cd rwdl
- Install dependencies:
pip install -r requirements.txt
python rwdl.py --url [BASE_URL] --extension [EXTENSIONS]
python rwdl.py \
--url https://example.com/files/ \ # Base URL to start downloading from
--depth 3 \ # Recursion depth (0=base only)
--extension .torrent,.iso \ # File extensions to download
--output ./downloads \ # Output directory
--delay 0.5 # Optional delay between requests
Argument | Short | Required | Default | Description |
---|---|---|---|---|
--url |
-u |
Yes | Base URL to start downloading from | |
--extension |
-e |
Yes | Comma-separated file extensions to download | |
--depth |
-d |
No | 1 | Recursion depth (0=base only) |
--output |
-o |
No | ./downloads | Output base directory |
--delay |
No | 0.5 | Delay between requests in seconds | |
--help |
-h |
No | Show help message | |
--version |
-v |
No | Show version and |
- Download ParrotOS torrent files (depth 1):
python rwdl.py \
--url https://deb.parrot.sh/parrot/iso/ \
--extension .torrent \
--output ./parrot_torrents \
--depth 1 \
- Download Windows installers (depth 2, slower):
python rwdl.py \
--url https://software.example.com/windows/ \
--depth 2 \
--extension .exe,.msi \
--delay 1.0
- Download Debian Linux ISOs (base directory only):
python rwdl.py \
--url https://cdimage.debian.org/cdimage/weekly-builds/amd64/ \
--depth 1 \
--extension .iso,.img
The script creates a directory structure mirroring the remote server:
If the base URL is https://example.com/files/
and the directories are structured like this:
https://example.com/files/folder1/file1.ext
│ └── file2.ext
├── folder2/
│ └── nested/
│ └── file3.ext
└── base_files.ext
The output will be structured as follows:
output_dir/
├── folder1/
│ ├── file1.ext
│ └── file2.ext
├── folder2/
│ └── nested/
│ └── file3.ext
└── base_files.ext
- Requires Apache-style directory listings
- Doesn't handle JavaScript-rendered content
- Won't follow links to external domains
- Limited to HTTP/HTTPS protocols
- May not work with custom directory listing formats
Problem: Script fails to parse directory
- Solution: Verify the URL shows a standard Apache directory listing
Problem: Downloads are incomplete
- Solution: Increase delay time with
--delay 1.0
Problem: SSL certificate errors
- Solution: Add this before the script:
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
This project is licensed under the MIT License - see the LICENSE file for details.
Tip
Read the Code of Conduct
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin feature/improvement
) - Open a pull request
Disclaimer: Use this script responsibly and respect server resources. Always comply with website terms of service and robots.txt directives.