A scraped table data from AniTrakt by Huere to get anime mappings on MyAnimeList and Trakt.
Warning
THIS REPO IS NOT OFFICIALLY SUPPORTED BY HUERE, MAL, or TRAKT.
If you used any contents from this repo in your project and found bugs or want to submit a suggestion, please send us issues.
Note
Extended Database Available
For a more comprehensive dataset with richer metadata, please use
the Extended Database
repo instead. The extended database includes release years, external IDs
(TMDB, TVDB, IMDb), and handles issues like guessed_slug
. This repository
should primarily be used if you only need the basic mapping between MyAnimeList
and Trakt IDs.
- Intelligent Filtering: Configurable ignore rules with support for AND/OR logic
- Data Overwriting: Manual overrides for specific entries via overwrite files
- Error Handling: Robust error handling with custom exception hierarchy
- Modular Architecture: Clean, maintainable code structure with separated concerns
Key Name | Type | Description |
---|---|---|
title |
string |
The title of the anime |
mal_id |
int |
MyAnimeList ID of the anime |
trakt_id |
int |
Trakt ID of the show/movie |
guessed_slug |
string | null |
Guessed slug of the anime, see comments for additional context |
type |
Enum["shows", "movies"] |
Type of the anime |
season |
int |
Season number of the anime, only for type == "shows" |
Note
Final result does not contain comments, it's just for additional context in this README.
To construct a link, you can use the following format:
https://trakt.tv/{type}/{guessed_slug}/seasons/{season}
[
// Example of a movie "Kimi no Na wa."
{
"title": "Kimi no Na wa.",
"mal_id": 32281,
"trakt_id": 1402,
// Guessed slug won't work for movies, see additional comment
"guessed_slug": "your-name",
"type": "movies"
}
]
To construct a link, you can use the following format:
https://trakt.tv/{type}/{guessed_slug}-{year, see additional comment}
The parser supports intelligent filtering through ignore rule files. These rules allow you to exclude specific items from the final dataset based on various criteria.
[
{
"source": "remote|all",
"type": "OR|AND|ANY|ALL",
"conditions": [
{
"field_name": "value_to_match"
}
],
"description": "Human-readable description of the rule"
}
]
remote
: Applied immediately after parsing the AniTrakt databaseall
: Applied after overwrite processing (but overwrite items are protected)
OR
/ANY
: Match if any condition is trueAND
/ALL
: Match if all conditions are true
You can create conditions based on any field in the data structure:
title
- Exact title matchmal_id
- MyAnimeList ID (supportsnull
for missing IDs)trakt_id
- Trakt ID (supportsnull
for missing IDs)guessed_slug
- Generated slugseason
- Season number (shows only)type
- Media type ("movies" or "shows")
If multiple fields exists inside one condition statement, it will behave as
AND
.
[
{
"source": "all",
"type": "ANY",
"conditions": [
{ "mal_id": 0 },
{ "mal_id": null },
{ "trakt_id": 0 },
{ "trakt_id": null }
],
"description": "Ignore items with invalid IDs"
},
{
"source": "remote",
"type": "ANY",
"conditions": [
{ "mal_id": 50532 },
{ "mal_id": 986 },
{ "mal_id": 12231 },
{ "mal_id": 32051 },
{ "mal_id": 2020 },
{ "mal_id": 31704 },
{ "mal_id": 28285 }
],
"description": "Special/OVA titles found in TV show entries"
},
{
"source": "all",
"type": "AND",
"conditions": [
{ "type": "movies" },
{ "guessed_slug": null }
],
"description": "Remove movies without valid slugs"
}
]
These files contain manual additions or corrections to the scraped data. Items in overwrite files are protected from "all" source ignore rules but still subject to "remote" source filtering.
- Add missing entries not found in AniTrakt database
- Correct incorrect mappings or metadata
- Override titles with better translations
- Add custom entries for special cases
[
{
"title": "Nijiyon Animation 2",
"mal_id": 57623,
"trakt_id": 198874,
"guessed_slug": "nijiyon-animation",
"season": 2,
"type": "shows"
},
{
"title": "Ameku Takao no Suiri Karte",
"mal_id": 58600,
"trakt_id": 233930,
"guessed_slug": "ameku-m-d-doctor-detective",
"season": 1,
"type": "shows"
}
]
The parser follows this sequence to ensure data integrity:
- Fetch & Parse: Scrape data from AniTrakt website
- Remote Filtering: Apply ignore rules with
"source": "remote"
- Overwrite Processing: Merge/replace items from overwrite files
- Final Filtering: Apply ignore rules with
"source": "all"
(overwrite items are protected) - Sorting & Output: Sort alphabetically by title (case-insensitive) and save to JSON files
graph TD
A[AniTrakt Website] --> B[HTML Parser]
B --> C[Remote Filtering]
C --> D[Overwrite Processing]
D --> E[Protected Items]
D --> F[Regular Items]
F --> G[Final Filtering]
E --> H[Merge Protected + Filtered]
G --> H
H --> I[Sort Alphabetically]
I --> J[Save to JSON]
pip install requests beautifulsoup4
python main.py
The parser will automatically:
- Fetch the latest data from AniTrakt
- Apply all configured filters and overwrites
- Generate sorted JSON output files
- Create a timestamp file for tracking updates
File | Description |
---|---|
db/movies.json |
Movie mappings (sorted alphabetically) |
db/tv.json |
TV show mappings (sorted alphabetically) |
updated.txt |
Last successful update timestamp (UTC) |
movies.html |
Cached HTML from AniTrakt movies page |
shows.html |
Cached HTML from AniTrakt shows page |
File | Purpose |
---|---|
db/ignore_movies.json |
Ignore rules for movies |
db/ignore_tv.json |
Ignore rules for TV shows |
db/overwrite_movies.json |
Manual overrides for movies |
db/overwrite_tv.json |
Manual overrides for TV shows |
The refactored codebase follows a modular architecture:
FileManager
: Handles all file I/O operations with UTF-8 supportFilterEngine
: Processes ignore rules with AND/OR logicDataManager
: Manages data merging and overwritingHTMLParser
: Scrapes and parses AniTrakt website dataAniTraktParser
: Main orchestrator coordinating all componentsTextUtils
: Text processing utilities for slugification- Custom Exceptions: Proper error handling hierarchy
For the most reliable and complete data, including accurate slugs, release years, and other metadata, it is highly recommended to use the Extended repo. The extended database programmatically fetches the correct information directly from the Trakt.tv API, resolving the limitations described below.
This repository is best suited for users who only require the basic mapping between MyAnimeList and Trakt IDs.
If you choose to use this repository, please be aware of the following
limitations regarding the guessed_slug
field:
-
Based on English Titles:
Slugs are generated from the presumed English title of the anime. This can lead to inaccuracies if the title on Trakt.tv differs. -
Movies Require the Year:
Theguessed_slug
for movies is incomplete. Trakt.tv requires the release year to be appended to the slug (e.g.,your-name-2016
). This information is not included in this database. -
Potential for Mismatches:
While generally effective for TV shows, aguessed_slug
might not work for shows with similar names on Trakt as well. -
Non-alphabetical Titles:
Titles that are purely numerical or symbols have anull
value forguessed_slug
to prevent conflicts with Trakt's numeric ID system.
In cases where the guessed_slug
is incorrect, you can always fall back to
using the trakt_id
to fetch the correct information directly from the
Trakt.tv API.
We welcome contributions! Here's how to get started:
- Fork the repository
- Create your feature branch:
git checkout -b feature/amazing-feature
- Configure ignore rules or overwrite files as needed
- Test your changes:
python main.py
- Commit your changes:
git commit -m 'Add amazing feature'
- Push to the branch:
git push origin feature/amazing-feature
- Open a Pull Request
- Ensure all new features include appropriate logging
- Test ignore rules and overwrite files thoroughly
- Update documentation for any new configuration options
- Follow the existing code style and architecture patterns