Skip to content

Detecting a pptx taking a very long time #688

@scottdoc

Description

@scottdoc

Description

Hello,

I have this library running in AWS via S3 in order to detect uploaded file types. I use fileTypeFromTokenizer.

When it came to a specific pptx file, it seemed to hang and take a very long time. Several minutes were not enough.

I eventually got it to correctly detect the type after 7 minutes.

Looking at the code I could see it was going through and getting zip headers. This was a relatively large PPT (17MB) with more than 20 photos bundled into it. Big, but not abnormally big.

Each one was downloaded like this:

{
  compressedSize: 631,
  uncompressedSize: 631,
  filenameLength: 16,
  extraFieldLength: 0,
  filename: '[trash]/0000.dat'
}
{
  compressedSize: 0,
  uncompressedSize: 0,
  filenameLength: 16,
  extraFieldLength: 0,
  filename: 'docProps/app.xml'
}
{
  compressedSize: 0,
  uncompressedSize: 0,
  filenameLength: 24,
  extraFieldLength: 0,
  filename: 'ppt/fonts/font23.fntdata'
}
... more fonts 
{
  compressedSize: 0,
  uncompressedSize: 0,
  filenameLength: 20,
  extraFieldLength: 0,
  filename: 'ppt/media/image1.png'
}
... more images

It finally got to

{
  compressedSize: 0,
  uncompressedSize: 0,
  filenameLength: 33,
  extraFieldLength: 0,
  filename: 'ppt/notesMasters/notesMaster1.xml'
}

And then was able to see it ended in ".xml" and reported back the content type.

I really know very little about detecting Mime types, so forgive me if this is completely wrong, but I was wondering if the filename "ppt/*" was enough to declare the file type and shortcut downloading everything? Similar to a shortcut provided for xlsx?

I will put up a PR if I can.

Thanks.

Existing Issue Check

  • I have searched the existing issues and could not find any related to my problem.

ESM (ECMAScript Module) Requirement Acknowledgment

  • My project is an ESM project and my package.json contains the following entry: "type": "module".

File-Type Scope Acknowledgment

  • I understand that file-type detects binary file types and not text or other formats.

Resolves #688

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementImprovement of existing functionality

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions