-
-
Notifications
You must be signed in to change notification settings - Fork 387
Description
Description
Hello,
I have this library running in AWS via S3 in order to detect uploaded file types. I use fileTypeFromTokenizer.
When it came to a specific pptx file, it seemed to hang and take a very long time. Several minutes were not enough.
I eventually got it to correctly detect the type after 7 minutes.
Looking at the code I could see it was going through and getting zip headers. This was a relatively large PPT (17MB) with more than 20 photos bundled into it. Big, but not abnormally big.
Each one was downloaded like this:
{
compressedSize: 631,
uncompressedSize: 631,
filenameLength: 16,
extraFieldLength: 0,
filename: '[trash]/0000.dat'
}
{
compressedSize: 0,
uncompressedSize: 0,
filenameLength: 16,
extraFieldLength: 0,
filename: 'docProps/app.xml'
}
{
compressedSize: 0,
uncompressedSize: 0,
filenameLength: 24,
extraFieldLength: 0,
filename: 'ppt/fonts/font23.fntdata'
}
... more fonts
{
compressedSize: 0,
uncompressedSize: 0,
filenameLength: 20,
extraFieldLength: 0,
filename: 'ppt/media/image1.png'
}
... more images
It finally got to
{
compressedSize: 0,
uncompressedSize: 0,
filenameLength: 33,
extraFieldLength: 0,
filename: 'ppt/notesMasters/notesMaster1.xml'
}
And then was able to see it ended in ".xml" and reported back the content type.
I really know very little about detecting Mime types, so forgive me if this is completely wrong, but I was wondering if the filename "ppt/*" was enough to declare the file type and shortcut downloading everything? Similar to a shortcut provided for xlsx?
I will put up a PR if I can.
Thanks.
Existing Issue Check
- I have searched the existing issues and could not find any related to my problem.
ESM (ECMAScript Module) Requirement Acknowledgment
- My project is an ESM project and my
package.json
contains the following entry:"type": "module"
.
File-Type Scope Acknowledgment
- I understand that file-type detects binary file types and not text or other formats.
Resolves #688