-
-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.pdf files can be detected as .ai based on content #582
Comments
@sindresorhus Curious if you have thoughts on this. I plan to put up a fix with the second approach, but I would like to get https://github.com/sindresorhus/file-type/pulls in first |
But it seems like we don't have access to the original file extension, since we only use the stream which makes sense, so maybe this approach is no good. In my own usage, I'll work around it by managing this case in the caller. Still, I wonder if there's a better way to do this than what we have today. |
None of the file implemented recognition is perfect (guaranteed to be correct). By writing 4 characters at the beginning of a text file you can probably mimic half of of the file recognition heuristics. This reliability of the heuristics vary strongly. If the recognition is likely to introduce false positives (for which there is no clear definition), it may indeed be better to, preferably improve the algorithm, or, like you suggest, fall back on it's parent file type. |
We indeed can only determine the file-type based on the file content. |
When pdf files have images created from photoshop or adobe ai in them,
file-type
detects them as .ai based on the byte checking heuristic we have in place.I'm proposing that even if the magic string is found, if the original file's extension is
.pdf
,file-type
should consider it a pdf and not change it's type based on some content inside of it.An even more strict approach that I would also support is only returning ai file type if the file extension is already
.ai
. It seems more natural/compatible to default to .pdf if .ai isn't explicitly specified, since the ai detection is just a loose heuristic anyway.The text was updated successfully, but these errors were encountered: