Skip to content

False negative on some MP3s #750

@myndzi

Description

@myndzi

Description

Hi there. First off, kudos on great work here :) Really helpful library and does just what I want and no more.

I was using this library as part of an exploratory side project and noticed some MP3 files that didn't get identified.

I dug into it to see where the mismatch is, as well as comparing outputs of file, ffprobe, and music-metadata; all of these recognize it (but in different ways, by different mechanisms).

  • ffprobe had a warning log that was a clue: [mp3 @ 0x139705ac0] Skipping 10 bytes of junk at 62401.
  • file seems to be somewhat imprecise in its output (as noted in another issue) but can identify the content-type suitably
  • music-metadata does return seemingly-correct data, which confused me at first because this library is a dependency of that one! But it turns out it just does a file-extension check before attempting to guess the format by content. Inspecting the returned object in a repl also shows about a thousand "empty tag" warnings suggesting the file in question is definitely a bit wack.

As for possible improvements, enabling the debug logs for music-metadata shows that when it actually parses the file it is doing it robustly, syncing to the frame sync markers*, and reading correct data after. So there's a possibilty that if you're just trying to identify an mp3 (and not read the contents), that it's enough to attempt to sync the stream even if the first frame doesn't show up where it's expected after the ID3 tag (in this file at least, the ID3 tag is first)

* It's been a while, I don't remember how MP3 sync works...

libavformat seems to use a heuristic, since it's not going in with the expectation that "this is an mp3 and I'm attempting to extract data from it" (which is what music-metadata seems to be doing from the extension check). libavformat scans over 64kb from its starting point and attempts to decode each position as the start of a frame; then, it extracts the frame length and checks if there is another valid frame at the expected position: https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/mp3dec.c#L419-L431

file is incomprehensible to me, and I can't even find where it defines the checks for actual MP3 files 🤷

I'm not sure if these approaches are within the scope of what you want for this library, since they are a bit "meaty" compared to simple magic checks. I can potentially submit a PR, but wanted to check the temperature before investing the time. What do you think?

Existing Issue Check

  • I have searched the existing issues and could not find any related to my problem.

File-Type Scope Acknowledgment

  • I understand that file-type detects binary file types and not text or other formats.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions