Skip to content

Ability to detect MPEG when prefixed with some noise #646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 23, 2025

Conversation

Borewit
Copy link
Collaborator

@Borewit Borewit commented Jul 16, 2024

Allow 4 bytes of prefixing MPEG noise.

Via options.mpegOffsetTolerance this number can be increased, which will result it a bit slow scanning times.

Resolves #628

ToDo

@Borewit Borewit self-assigned this Jul 16, 2024
@Borewit Borewit marked this pull request as draft July 16, 2024 21:20
@Borewit Borewit added the improvement Improvement of existing functionality label Aug 14, 2024
@Borewit Borewit requested review from sindresorhus and removed request for sindresorhus August 14, 2024 09:38
@sindresorhus sindresorhus force-pushed the main branch 2 times, most recently from 5b9fdd9 to 0e91f7b Compare September 12, 2024 07:06
@Borewit Borewit force-pushed the tolerate-some-mpeg-offset branch from a27985e to fc8d271 Compare May 12, 2025 16:42
@Borewit Borewit linked an issue May 13, 2025 that may be closed by this pull request
2 tasks
@Borewit Borewit force-pushed the tolerate-some-mpeg-offset branch 2 times, most recently from 4399f21 to 1441b28 Compare May 15, 2025 16:01
@Borewit Borewit force-pushed the tolerate-some-mpeg-offset branch from 1441b28 to c3e924c Compare May 18, 2025 17:17
@myndzi
Copy link

myndzi commented May 18, 2025

Since you're working on this approach, I wanted to pull a note from the issue thread for more direct consideration:

ffmpeg looks for two sync frames by reading the size from the first and seeing if the second exists (best I can tell). This might be worthwhile here, because scanning for the sync word alone across many bytes could significantly increase false positive potential.

I noticed there seems to be an extreme preference for peeking the fewest bytes possible, so I'm not sure where y'all would land on the tradeoff between reading a full mpeg frame and being less certain. Whether reading from a file or a stream, your data source is likely to have more bytes available than just a couple (whether it be a full TCP packet from a network stream or a cluster read from disk or whatever), so it may be free to pull (some amount of) further bytes than just the tolerance offset.

It feels acceptable to me as a user, since I'm (presumably) configuring the behavior?

@Borewit
Copy link
Collaborator Author

Borewit commented May 18, 2025

Thank you for your suggestion @myndzi .

We certainly prefer to have a good detection, and read a few more bytes, no problem. As support detecting from streams, we have no random access to the underlying file, hence things are designed in way to handle with that.

One of the challenging with the stream is, that we cannot have one detector reading deep into the stream, and then pass the stream to the next detector, as each detector has to read from the beginning of the stream.

@myndzi
Copy link

myndzi commented May 18, 2025

Yeah, this whole thing nerd sniped me and I've been poking at an alternative implementation that would deal with that sort of thing. If it gets anywhere I may bring it back and see if y'all are interested.

The fortunate thing for this specific case (with respect to streams) is that it's a contiguous scan to do that thing, so it shouldn't need random access.

@Borewit Borewit force-pushed the tolerate-some-mpeg-offset branch from c3e924c to 69b82a8 Compare May 19, 2025 09:04
@Borewit
Copy link
Collaborator Author

Borewit commented May 19, 2025

There is a peek function provided by strtok3, if we can do sequential peeks (starting to peak at of offset of the previous peek, rather then from position 0), we would have greater flexibility to dig deeper into the "stream", without disturbing other detectors.

@Borewit Borewit force-pushed the tolerate-some-mpeg-offset branch from 69b82a8 to fd92863 Compare May 21, 2025 17:30
@Borewit
Copy link
Collaborator Author

Borewit commented May 21, 2025

@myndzi , I propose to keep the existing solution initially with the frame detection as it used to be as the step. We can follow up with an additional PR to improve things.

@Borewit Borewit requested a review from Copilot May 21, 2025 17:34
@Borewit Borewit marked this pull request as ready for review May 21, 2025 17:36
Copilot

This comment was marked as resolved.

@Borewit Borewit force-pushed the tolerate-some-mpeg-offset branch 3 times, most recently from e889b85 to 9870d20 Compare May 21, 2025 17:42
@Borewit Borewit force-pushed the tolerate-some-mpeg-offset branch from 9870d20 to 2c3e8ba Compare May 21, 2025 17:44
@Borewit Borewit requested a review from sindresorhus May 21, 2025 17:45
@Borewit
Copy link
Collaborator Author

Borewit commented May 21, 2025

After merging this one @sindresorhus , I propose to do a major release.

@myndzi
Copy link

myndzi commented May 21, 2025

@myndzi , I propose to keep the existing solution initially with the frame detection as it used to be as the step. We can follow up with an additional PR to improve things.

Too much indirection for me to parse clearly 😂

If I am reading this correctly, you're saying "merge the solution that scans for a sync header and calls it a day", and don't add the ffmpeg "read the header and check if there's another sync frame to validate it" to this PR.

I agree. My comment wasn't really arguing for that functionality to make it into this PR; I just wanted to surface the false-positive potential for consideration since you decided to go with this approach. If y'all are happy with the tradeoff as it stands, it's still a win for me :)

Thank you again for picking this back up, too!

@Borewit
Copy link
Collaborator Author

Borewit commented May 22, 2025

@myndzi I think to check for a second MPEG frame is good suggestion, but I will consider for a follow of. As like you said, this is in my view already a step forward.

@sindresorhus sindresorhus merged commit c40840a into main May 23, 2025
6 checks passed
@sindresorhus sindresorhus deleted the tolerate-some-mpeg-offset branch May 23, 2025 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

False negative on some MP3s Custom detectors running after default detectors
3 participants