Skip to content

Support using MusicBrainz IDs instead of other metadata #1363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

vs49688
Copy link
Contributor

@vs49688 vs49688 commented Sep 23, 2021

Converts the library to use MusicBrainz IDs for albums, artists,
albumartists, and tracks. All playlists, stars, etc. are kept.

This is an irreversible transformation; the user is given a warning
and may cancel within 10 seconds by pressing ^C.

The mediafiles table is used as the sole source-of-truth to build
an entirely new hierarchy, based off the MusicBrainz IDs.

Existing entity metadata is applied to the new entities, except in the
case where one entity may need to be split (e.g. different releases of
the same album). These split entities are left blank.

References are then updated for both play queues and playlists.

The existing entities are then deleted, and a full-rescan is triggered
to fill in any missing information.

Closes #489.

@certuna
Copy link
Contributor

certuna commented Sep 24, 2021

I think it's a good idea in principle, just some remarks:

  • this will split albums where some tracks have different or no mbz_album_id, which may catch more than a few users by surprise, in particular those who group "singletons" together into one pseudo-compilation album (e.g. "Loose Tracks"). The alternative is using Grouping or Release Type for grouping these together (and excluding them from the album list), but unfortunately those features aren't implemented yet so you'd leave those users with no effective way to manage singletons
  • a good thing is this will allow splitting artists with the same name

@vs49688
Copy link
Contributor Author

vs49688 commented Sep 24, 2021

I think it's a good idea in principle, just some remarks:

* this will split albums where some tracks have different or no `mbz_album_id`, which may catch more than a few users by surprise, in particular those who group "singletons" together into one pseudo-compilation album (e.g. "Loose Tracks"). The alternative is using Grouping or Release Type for grouping these together (and excluding them from the album list), but unfortunately those features aren't implemented yet so you'd leave those users with no effective way to manage singletons

In this case, I'm of the mind to tell the user to strip the MusicBrainz tags from those files, retaining the current behaviour. By moving them into a "Loose Tracks" album, you're effectively creating a "new" album. If this "new" album isn't on MusicBrainz, it shouldn't have MusicBrainz tags. Also, isn't this what playlists are for?
Of course, this might be a unpopular opinion, but I think it's the more "correct" one.

Another option would be to add a "navidrome_mbignore" tag, which if set would tell Navidrome to ignore any MusicBrainz tags it finds when scanning.

* a good thing is this will allow splitting artists with the same name

Indeed, I never considered this may need to be done with artists until after I'd pushed the commit.

@vs49688
Copy link
Contributor Author

vs49688 commented Sep 25, 2021

Pushed new version that should handle track/album/artist/albumartist IDs.

@vs49688 vs49688 changed the title [DRAFT] scanner: use musicbrainz release id when generating album id scanner/mapping: take MusicBrainz tags into account when generating IDs Sep 25, 2021
@certuna
Copy link
Contributor

certuna commented Sep 25, 2021

One issue will be: what does Navidrome do with artists that have the same mbid but different tags in the Artist field, such as "DJ Tiesto" and "Tiesto"?

It's not so easy, you might get some strange behaviour, consider this case:

  • DJ Tiesto - Album 1 (with mbid's)
  • Tiesto - Album 2 (with mbid's)
  • DJ Tiesto - Album 3 (without mbid's)
  • Tiesto - Album 4 (without mbid's)

Current behaviour: album 1 and 3 grouped under DJ Tiesto, album 2 and 4 grouped under Tiesto
What will the new behaviour be?

@vs49688
Copy link
Contributor Author

vs49688 commented Sep 25, 2021

Regardless of my changes, albums 3 & 4 should be under different artists (or the artists fixed).
As for 1 & 2, perhaps I should change it to only use the mbid if it exists?

@vs49688
Copy link
Contributor Author

vs49688 commented Sep 25, 2021

Actually, thinking about it some more... What do we want the behaviour to be?
Do we want Navidrome to merge artists/albums/whatever based on mbid? That'd be a pretty drastic change in behaviour, but not an unreasonable one.

All I intended was to make it not merge different releases of the same album. If the tags in the files are wrong (thus causing Navidrome to split the artists/albums), then they should be fixed.

I'm fine with either option (it doesn't affect me as I run everything through Picard and beets), @deluan what do you think?

@certuna
Copy link
Contributor

certuna commented Sep 26, 2021

The more I think of it, it's indeed a drastic change.

Navidrome currently does all organization based on official tag fields: Artist, Album Artist, Album Title. It's not perfect (no disambiguation possible between artists or albums with the same name), but at least it's pretty transparent why stuff happens.

Moving to organization based on custom Musicbrainz ID tags is good if you consistently use those, but that puts the onus on users to have not only an immaculately tagged collection with 'regular' tags, but also consistently have the mbz tags correct, because you'll create splits between stuff with and without mbz id's. And this decreases transparency to the user why this happens, and we'll end up like Plex where the scanner feels like a black box where "anything can happen".

For example:

  • currently a user can move an album to another artist just by changing the Artist tag, that's very logical and simple. But the album still retains the old artist_mbid tag, what will Navidrome do?
  • currently a user can move a track to another album just by changing the Album Title tag. Logical, simple, and that's what pretty much every music player (foobar, Apple Music, MusicBee, etc) allows you to do. Grouping by release_mbid breaks that.

@vs49688
Copy link
Contributor Author

vs49688 commented Sep 26, 2021

The more I think of it, it's indeed a drastic change.

Navidrome currently does all organization based on official tag fields: Artist, Album Artist, Album Title. It's not perfect (no disambiguation possible between artists or albums with the same name), but at least it's pretty transparent why stuff happens.

Moving to organization based on custom Musicbrainz ID tags is good if you consistently use those, but that puts the onus on users to have not only an immaculately tagged collection with 'regular' tags, but also consistently have the mbz tags correct, because you'll create splits between stuff with and without mbz id's. And this decreases transparency to the user why this happens, and we'll end up like Plex where the scanner feels like a black box where "anything can happen".

As long as the scanning behaviour is explicitly-and-well-documented, I don't really see the problem.

For example:

* currently a user can move an album to another artist just by changing the `Artist` tag, that's very logical and simple. But the album still retains the old `artist_mbid` tag, what will Navidrome do?

* currently a user can move a track to another album just by changing the `Album Title` tag. Logical, simple, and that's what pretty much every music player (foobar, Apple Music, MusicBee, etc) allows you to do. Grouping by `release_mbid` breaks that.

How about this?

Add a ScanUseMbzID option in the configuration file, unset by default.
When unset, the behaviour is unchanged, keeping it as it is now.

When set, if an album/artist has a mbid, Navidrome will use it (and only it). This provides a solution to #489, assuming everything's tagged correctly. If the mbid is NOT set, it'll fall back and create a new/duplicate artist/album. It is up to the user to tag the files correctly and trigger a rescan.

Forgive me for being pushy on this, but I need a fix for the #489 - it's affecting large part of my library.

@deluan
Copy link
Member

deluan commented Sep 26, 2021

Hey, sorry being late for the party :)

After reading the thread, these are my two cents (mostly agreeing with latest @vs49688 comment):

  1. The new behaviour (using mbids) has to be configurable, and turned off by default
  2. When the feature is enabled, if no mbid is found, fallback to the current behaviour.
  3. I think it is useful only for albums, albumartists and artists, but not for tracks.

On top of that, a migration plan is a MUST, or else users that enable this will break their playlists, stars, playcounts, etc... The migration should go over all tables with an album_id or artist_id and recalculates a new id. Foreign keys must be observed. To make things even trickier, this will have to happen only when the config option is first enabled, and cannot be disabled afterwards (or it would mess with the DB), making the implementation tricky (similar to what I had to do for PasswordEncryptionKey)

@vs49688
Copy link
Contributor Author

vs49688 commented Sep 27, 2021

How'd you prefer the migrations to be done?

  1. Duplicate entries, except with the mbid as the id, fix up references, delete old, vacuum
  2. Disable foreign keys, update in-place

@deluan
Copy link
Member

deluan commented Sep 27, 2021

How'd you prefer the migrations to be done?

  1. Duplicate entries, except with the mbid as the id, fix up references, delete old, vacuum
  2. Disable foreign keys, update in-place

Whichever is faster. I'd assume it is the later, but it is worth to test it out if you can.

@vs49688 vs49688 force-pushed the adedup branch 9 times, most recently from 5298640 to 867aaf4 Compare September 29, 2021 02:39
@vs49688
Copy link
Contributor Author

vs49688 commented Sep 29, 2021

It's mostly finished, all that needs to be done now is some cleanups, and to trigger a full rescan.

Also, where's the best place for this? I've just put it in the top-level cmd package to start with, but it should probably go elsewhere. I'd like to put it in db/migrations (specifically so I can use forceFullRescan), but can't because of circular imports.

@vs49688 vs49688 marked this pull request as ready for review October 2, 2021 15:08
@deluan deluan mentioned this pull request Oct 6, 2021
@vs49688
Copy link
Contributor Author

vs49688 commented Oct 6, 2021

Ping on this?

Specific issues I'd like feedback on:

  • It's in the root cmd package for now, where should it actually go?
  • Do the DeleteMany functions I've added to the {artist,album}Repository interfaces have security implications I'm not aware of - i.e. does it automatically create a REST route for it (or something, I haven't really looked at the code for this).

The current behaviour:

  • If "Scanner.UseMbzIDs" is enabled:
    • Converts library to use MusicBrainz IDs for albums, artists, and albumartists
    • All playlists, stars, etc. are kept.
  • This is an irreversible transformation; adequate warning is given.
  • If set accidentally, the user may cancel the migration within 10 seconds by pressing ^C.
  • Once enabled, it will throw an error if attempted to disable.
  • A full rescan is required to finish the process - this must be triggered manually.

@deluan
Copy link
Member

deluan commented Oct 7, 2021

Sorry for the delay. Will take a look tonight

Not MD5'ing them because they're UUIDs and can be
eyeball'd easily in the database.
Converts the library to use MusicBrainz IDs for albums, artists,
albumartists, and tracks. All playlists, stars, etc. are kept.

This is an irreversible transformation; the user is given a warning
and may cancel within 10 seconds by pressing ^C.

The mediafiles table is used as the sole source-of-truth to build
an entirely new hierarchy, based off the MusicBrainz IDs.

Existing entity metadata is applied to the new entities, except in the
case where one entity may need to be split (e.g. different releases of
the same album). These split entities are left blank.

References are then updated for both play queues and playlists.

The existing entities are then deleted, and a full-rescan is triggered
to fill in any missing information.
Copy link

Download the artifacts for this pull request:

@deluan
Copy link
Member

deluan commented Nov 5, 2024

Hey @vs49688, as you know, I'm adding support for this in #2709, but I'm struggling with what to do when the user has more than one copy of the same album (let's say a MP3 copy and a FLAC copy), and both have same MBIDs. How are you handling this?

Another challenge is that now that we will support multiple roles and artists for each album/track, we may get artists without MBIDs, as there are no tags for Composers, Lyricist, Remixer, Performer, etc... This could cause an artist to be duplicated (one with MBID, and another with just Name). Any ideas?

@vs49688
Copy link
Contributor Author

vs49688 commented Nov 6, 2024

Hey @vs49688, as you know, I'm adding support for this in #2709, but I'm struggling with what to do when the user has more than one copy of the same album (let's say a MP3 copy and a FLAC copy), and both have same MBIDs. How are you handling this?

I don't. This is very much a "curate/tag your library properly" problem, in my mind. I'd argue that a certain degree of discipline is required for libraries at scale. If I do have any duplicates, beets catches them on import.

Maybe if this happens, just spit a warning and keep the existing one?

Another challenge is that now that we will support multiple roles and artists for each album/track, we may get artists without MBIDs, as there are no tags for Composers, Lyricist, Remixer, Performer, etc... This could cause an artist to be duplicated (one with MBID, and another with just Name). Any ideas?

Hmm, yeah that's a tricky one...I'm not sure what to do either. Unless there's some kind of unique ID, you might have to fall back to basic name-matching.
Or... and this could be a complete non-starter because privacy/disable-external-services - you have the track MBID, why not hit MBs APIs and query the missing MBIDs?

@deluan
Copy link
Member

deluan commented Nov 6, 2024

Maybe if this happens, just spit a warning and keep the existing one?

Yeah, that's what I'm inclined to do.

Hmm, yeah that's a tricky one...I'm not sure what to do either. Unless there's some kind of unique ID, you might have to fall back to basic name-matching. Or... and this could be a complete non-starter because privacy/disable-external-services - you have the track MBID, why not hit MBs APIs and query the missing MBIDs?

The fall back would already be in place, as there are artists that don't have MBID, and alos for these other roles we need to create an ID for them. But this will duplicate some artists.

Hitting MB APIs would be a good solution (privacy issues aside), but it would slow down a lot the scanner/import process. And doing it asynchronously after the scan is very complex (as we need the IDs when importing the files.

Not sure what is the right solution here. I may finish the code with only names for now, and when it is merged, I may come back to this.

@Szeraax Szeraax added this to the Big Refactor milestone Nov 12, 2024
@deluan
Copy link
Member

deluan commented Feb 19, 2025

Hey @vs49688 did you try the BFR version (#2709) already? Hoping to get some feedback from you.

@arg274
Copy link
Contributor

arg274 commented Feb 20, 2025

@deluan A quick test, it seems to not merge artists in certain scenarios. For example, I have this this comp where track 12 belongs moon, who is credited in the native script (মুন) within the comp. I also have his solo album, where he is credited as "moon". BFR seems to create two separate artist entries for this. I did confirm that both my files have the same artist MBID. This bug does not trigger when the artist is not aliased/credited as something else.

@vs49688
Copy link
Contributor Author

vs49688 commented Feb 20, 2025

@deluan I haven't tested it yet, I'll attempt to this weekend

@deluan
Copy link
Member

deluan commented Feb 20, 2025

@deluan A quick test, it seems to not merge artists in certain scenarios. For example, I have this this comp where track 12 belongs moon, who is credited in the native script (মুন) within the comp. I also have his solo album, where he is credited as "moon". BFR seems to create two separate artist entries for this. I did confirm that both my files have the same artist MBID. This bug does not trigger when the artist is not aliased/credited as something else.

Unfortunately Picard does not add MBIDs for all contributors (composers, performers, remixers...), so if we use MBID to disambiguate artists, you could end up with duplicated artists.

I decided against adding artist disambiguation by MBID for now, but I've been thinking on some solutions that may involve using a sidecar file. I'll give it more thinking after BFR is released.

@vs49688 vs49688 marked this pull request as draft February 21, 2025 04:42
@vs49688
Copy link
Contributor Author

vs49688 commented Feb 21, 2025

I've rebased on top of 0.54.5 here: https://github.com/vs49688/navidrome/tree/v0.54.5-mbz (untested)

EDIT: Doesn't compile, stand by
EDIT2: Fixed

@vs49688
Copy link
Contributor Author

vs49688 commented Mar 16, 2025

@deluan I've finally had a chance to try this. I've run into some problems with the scanner. It's stopping way too early.

time="2025-03-16T22:33:53+10:00" level=info msg="Scanner: Starting scan" fullScan=true numLibraries=1
time="2025-03-16T22:33:53+10:00" level=debug msg="Scanner: Starting phase 1: Scan all libraries and import new/updated files"
time="2025-03-16T22:33:53+10:00" level=info msg="Scanner: Completed processing folder" audioCount=0 elapsed=55.1ms imageCount=1 library="Music Library" plsCount=0 tracksImported=0 tracksMissing=0 ​folder="Adgio Hutchings/Path of Exile [2016]"
time="2025-03-16T22:33:53+10:00" level=info msg="Scanner: Completed processing folder" audioCount=0 elapsed=57.1ms imageCount=1 library="Music Library" plsCount=0 tracksImported=0 tracksMissing=0 ​folder="Casting Crowns/Until The Whole World Hears Live"
time="2025-03-16T22:33:53+10:00" level=info msg="Scanner: Completed processing folder" audioCount=0 elapsed=59.1ms imageCount=1 library="Music Library" plsCount=0 tracksImported=0 tracksMissing=0 ​folder="Chris Nelsen/Chaos Overlords OST [album]"
time="2025-03-16T22:33:53+10:00" level=info msg="Scanner: Completed processing folder" audioCount=23 elapsed=112.9ms imageCount=1 library="Music Library" plsCount=0 tracksImported=23 tracksMissing=0 ​folder=Compilations/Quadrophenia
time="2025-03-16T22:33:53+10:00" level=info msg="Scanner: Completed processing folder" audioCount=15 elapsed=140.5ms imageCount=1 library="Music Library" plsCount=0 tracksImported=15 tracksMissing=0 ​folder="Compilations/The Twilight Saga_ Breaking Dawn, Part 1"
time="2025-03-16T22:33:53+10:00" level=info msg="Scanner: Completed processing folder" audioCount=13 elapsed=82.1ms imageCount=1 library="Music Library" plsCount=0 tracksImported=13 tracksMissing=0 ​folder="Green Day/American Idiot [482]"
time="2025-03-16T22:33:53+10:00" level=info msg="Scanner: Completed processing folder" audioCount=0 elapsed=65.7ms imageCount=0 library="Music Library" plsCount=0 tracksImported=0 tracksMissing=0 ​folder="Tadayoshi Makino, Inon Zur, 近藤嶺 & 石川チャミィ"
time="2025-03-16T22:33:53+10:00" level=debug msg="Scanner: Finished reading folders" lib="Music Library" numFolders=4114 path=/storage/SyncRoot/Music/music
time="2025-03-16T22:33:53+10:00" level=debug msg="Scanner: Finished loading all folders" numChanged=8 numFolders=4114

@deluan
Copy link
Member

deluan commented Mar 16, 2025

Are you upgrading from your 1363 DB? Maybe there are changes that were not considered in #2709 migration/upgrade process.

Did you try a full scan?

@vs49688
Copy link
Contributor Author

vs49688 commented Mar 16, 2025

Are you upgrading from your 1363 DB? Maybe there are changes that were not considered in #2709 migration/upgrade process.

Yeah, I upgraded straight from the existing one, using:

  "PID.Track": "musicbrainz_trackid|track_legacy",
  "PID.Album": "musicbrainz_albumid|album_legacy",

Did you try a full scan?

I did now. It seemed to work. Everything's still there, including playlists.

Perhaps I'm worrying about nothing, but the primary keys in the media_file table seem inconsistent:

ffea705b-4d01-33eb-9950-daa73c5f447b
ffeadfb2597728055f47fa02b10c578a
ffebd52d-3f4f-3664-aeab-ade706bc376a
ffecb0f2-fcf5-4316-b45c-a1296f304afa
ffecf7a1-c50a-3f42-988e-1ca10cd8ec19
ffedaa76-ec5a-3d89-b7cc-bc5e8e3972be
ffee6146d1dec09ea5b145ae91d7d5f3
ffeeaf748ee726209b6e8ffba3c6ce24
fff12feb-2b9c-3dca-b6f3-50b6ac1cfd89
fff14e07-2072-3891-9ef3-50330b4a20ab
fff1c003-f007-45d9-9bc4-2354ff193240
fff1c6a4-4775-44b4-b24b-509d096d3347
fff7ce9e-272d-31fa-a802-b8d5c4404f63
fffae7dd-1411-34f6-bd82-81659289b1fc
fffb259f-26d4-37ce-81c5-14a04b5ee41d
ffff13c4-000b-31cc-8366-715f1f6a73bf
ffff5d45-9c1b-3bee-bcc5-400ebb84fb57
gBER1fBFjIBIdu0KB2i0dj
ikRLIFGrK9oQtxhSPLEqzl
jb12qWRGmx5yCpoItMwCbT
kivZUEpWZOzLAe7ntMviIh
lE17dXxiXtv7KKOTpMsXtd

Most of them are right - but I've discovered some albums (custom game rips, etc. without musicbrainz ids) are still in the DB, but not in the UI

Happy to jump on chat/Discord if that's easier

@vs49688
Copy link
Contributor Author

vs49688 commented Mar 16, 2025

Closing this... It's been a long ride, and it's time for the PR to rest.

To migrate to BFR:

  • Follow standard migration instructions
    • Do NOT use custom values for PID.Track and PID.Album
  • Trigger another full rescan

@vs49688 vs49688 closed this Mar 16, 2025
@github-project-automation github-project-automation bot moved this from Done to Released in Navidrome roadmap Mar 16, 2025
@deluan
Copy link
Member

deluan commented Mar 16, 2025

Most of them are right - but I've discovered some albums (custom game rips, etc. without musicbrainz ids) are still in the DB, but not in the UI

Just a note for posterity. The album_legacy and track_legacy options cannot be used in conjunction with other PID options, so:

  "PID.Track": "musicbrainz_trackid|track_legacy",
  "PID.Album": "musicbrainz_albumid|album_legacy",

This is invalid, and Navidrome is only using the MBIDs part. After removing this config, @vs49688 was able to find the missing albums.

Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 15, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Status: Released
Development

Successfully merging this pull request may close these issues.

Support multiple releases of the same album
7 participants