Ability to detect MPEG when prefixed with some noise #646

Borewit · 2024-07-16T21:19:06Z

Allow 4 bytes of prefixing MPEG noise.

Via options.mpegOffsetTolerance this number can be increased, which will result it a bit slow scanning times.

Resolves #628

ToDo

Document options.mpegOffsetTolerance
Add MPEG noise fixture
Handle corner case options.mpegOffsetTolerance exceed 256 (peeked buffer size) - 2 (MPEG preamble) [bytes]
Merge Allow file-type options to be directly passed to exported functions #752, for better consistency in implementation and documentation of passing optional options

myndzi · 2025-05-18T18:00:33Z

Since you're working on this approach, I wanted to pull a note from the issue thread for more direct consideration:

ffmpeg looks for two sync frames by reading the size from the first and seeing if the second exists (best I can tell). This might be worthwhile here, because scanning for the sync word alone across many bytes could significantly increase false positive potential.

I noticed there seems to be an extreme preference for peeking the fewest bytes possible, so I'm not sure where y'all would land on the tradeoff between reading a full mpeg frame and being less certain. Whether reading from a file or a stream, your data source is likely to have more bytes available than just a couple (whether it be a full TCP packet from a network stream or a cluster read from disk or whatever), so it may be free to pull (some amount of) further bytes than just the tolerance offset.

It feels acceptable to me as a user, since I'm (presumably) configuring the behavior?

Borewit · 2025-05-18T18:44:03Z

Thank you for your suggestion @myndzi .

We certainly prefer to have a good detection, and read a few more bytes, no problem. As support detecting from streams, we have no random access to the underlying file, hence things are designed in way to handle with that.

One of the challenging with the stream is, that we cannot have one detector reading deep into the stream, and then pass the stream to the next detector, as each detector has to read from the beginning of the stream.

myndzi · 2025-05-18T20:47:12Z

Yeah, this whole thing nerd sniped me and I've been poking at an alternative implementation that would deal with that sort of thing. If it gets anywhere I may bring it back and see if y'all are interested.

The fortunate thing for this specific case (with respect to streams) is that it's a contiguous scan to do that thing, so it shouldn't need random access.

Borewit · 2025-05-19T17:23:58Z

There is a peek function provided by strtok3, if we can do sequential peeks (starting to peak at of offset of the previous peek, rather then from position 0), we would have greater flexibility to dig deeper into the "stream", without disturbing other detectors.

Borewit · 2025-05-21T17:34:01Z

@myndzi , I propose to keep the existing solution initially with the frame detection as it used to be as the step. We can follow up with an additional PR to improve things.

Borewit · 2025-05-21T17:46:18Z

After merging this one @sindresorhus , I propose to do a major release.

myndzi · 2025-05-21T21:04:41Z

@myndzi , I propose to keep the existing solution initially with the frame detection as it used to be as the step. We can follow up with an additional PR to improve things.

Too much indirection for me to parse clearly 😂

If I am reading this correctly, you're saying "merge the solution that scans for a sync header and calls it a day", and don't add the ffmpeg "read the header and check if there's another sync frame to validate it" to this PR.

I agree. My comment wasn't really arguing for that functionality to make it into this PR; I just wanted to surface the false-positive potential for consideration since you decided to go with this approach. If y'all are happy with the tradeoff as it stands, it's still a win for me :)

Thank you again for picking this back up, too!

Borewit · 2025-05-22T08:12:10Z

@myndzi I think to check for a second MPEG frame is good suggestion, but I will consider for a follow of. As like you said, this is in my view already a step forward.

Borewit self-assigned this Jul 16, 2024

Borewit marked this pull request as draft July 16, 2024 21:20

Borewit added the improvement Improvement of existing functionality label Aug 14, 2024

Borewit requested review from sindresorhus and removed request for sindresorhus August 14, 2024 09:38

sindresorhus force-pushed the main branch 2 times, most recently from 5b9fdd9 to 0e91f7b Compare September 12, 2024 07:06

Borewit mentioned this pull request Dec 2, 2024

Readme compatibility note about makeTokenizer Borewit/tokenizer-s3#1351

Open

Borewit force-pushed the tolerate-some-mpeg-offset branch from a27985e to fc8d271 Compare May 12, 2025 16:42

This was referenced May 12, 2025

Improve support for erroneous ID3v2 headers #751

Closed

False negative on some MP3s #750

Closed

Borewit linked an issue May 13, 2025 that may be closed by this pull request

False negative on some MP3s #750

Closed

2 tasks

Borewit force-pushed the tolerate-some-mpeg-offset branch 2 times, most recently from 4399f21 to 1441b28 Compare May 15, 2025 16:01

Borewit mentioned this pull request May 15, 2025

Allow file-type options to be directly passed to exported functions #752

Merged

Borewit force-pushed the tolerate-some-mpeg-offset branch from 1441b28 to c3e924c Compare May 18, 2025 17:17

Borewit force-pushed the tolerate-some-mpeg-offset branch from c3e924c to 69b82a8 Compare May 19, 2025 09:04

Ability to detect MPEG when prefixed with some noise

c858bf5

Borewit force-pushed the tolerate-some-mpeg-offset branch from 69b82a8 to fd92863 Compare May 21, 2025 17:30

Borewit requested a review from Copilot May 21, 2025 17:34

Borewit marked this pull request as ready for review May 21, 2025 17:36

This comment was marked as resolved.

Sign in to view

Borewit force-pushed the tolerate-some-mpeg-offset branch 3 times, most recently from e889b85 to 9870d20 Compare May 21, 2025 17:42

Add MP3 fixture with MPEG data out of sync (deviating offset)

2c3e8ba

Borewit force-pushed the tolerate-some-mpeg-offset branch from 9870d20 to 2c3e8ba Compare May 21, 2025 17:44

Borewit requested a review from sindresorhus May 21, 2025 17:45

sindresorhus added 4 commits May 23, 2025 14:37

Update core.js

27248c6

Update core.d.ts

29385d0

Update readme.md

9118999

Update core.d.ts

0ba5b86

sindresorhus merged commit c40840a into main May 23, 2025
6 checks passed

sindresorhus deleted the tolerate-some-mpeg-offset branch May 23, 2025 11:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Ability to detect MPEG when prefixed with some noise #646

Ability to detect MPEG when prefixed with some noise #646

Uh oh!

Borewit commented Jul 16, 2024 •

edited

Loading

Uh oh!

myndzi commented May 18, 2025

Uh oh!

Borewit commented May 18, 2025

Uh oh!

myndzi commented May 18, 2025

Uh oh!

Borewit commented May 19, 2025

Uh oh!

Borewit commented May 21, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

Borewit commented May 21, 2025

Uh oh!

myndzi commented May 21, 2025 •

edited

Loading

Uh oh!

Borewit commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ability to detect MPEG when prefixed with some noise #646

Ability to detect MPEG when prefixed with some noise #646

Uh oh!

Conversation

Borewit commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

myndzi commented May 18, 2025

Uh oh!

Borewit commented May 18, 2025

Uh oh!

myndzi commented May 18, 2025

Uh oh!

Borewit commented May 19, 2025

Uh oh!

Borewit commented May 21, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

Borewit commented May 21, 2025

Uh oh!

myndzi commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Borewit commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

Borewit commented Jul 16, 2024 •

edited

Loading

myndzi commented May 21, 2025 •

edited

Loading