-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pillow is not working properly #42
Comments
What version did you try this with? I recently updated some of the compression parameters to be more in line with the kakadu ones. Could you retry with the latest version? Pillow should be the same as openjpeg. |
I think Kakadu is doing a better job adopting to the input images, at least with my default parameters. It's just a standard reduction, whereas I think Kakadu might do something more clever. You could experiment with other values like the |
If you use the build from issue #41 you could toy around some with it, but I tried again to use a single value for |
Thanks for the tip. I will experiment more with the latest version when I have a moment and let you know, although the previous reduction I observed does not make me confident there will be any interesting results.
You mean optically speaking, right? I will get back to you. |
Right, I meant if the resulting PDF optically looks much worse. kakadu definitely seems to be better, but there's probably ways to make OpenJPEG better, I just haven't invested a lot of time in trying all the different knobs. |
I tried using If it was simply super compressed, there would be a use case somewhere for someone, but it has about the same compression ratio as The initially reported problem still exists, so I cannot use
The error message does not give me a clue about the problem. I'm using different variants, but adding the space is what the documentation suggests. Do you get similar results or is it just me? I case of the former, if pillow can be tweaked to look half decent, I would suggest adding some pillow-specific defaults. If not, I'd give pillow a label or warning message: "Bad quality, for testing purposes only." |
Can you please state which version of Pillow you are using? python3 -m pip show pillow |grep Version If recent versions of Pillow do not provide reasonable JP2 quality, perhaps someone should file an issue requesting that they improve their encoder? |
I think Pillow uses OpenJPEG so that might not help. I think we can get better quality with Pillow/OpenJPEG and Grok, but I just didn't invest the time in trying to find the right flags. Maybe see what happens with multi-layer encoding, as the help options also suggest? |
$ python3 -m pip show pillow | grep Version
Version: 8.3.2
Once I get #41 working I can do some comparisons.
I was interested in Grok because it sounds promising, but I couldn't get Grok to build or install on Ubuntu, so that's the one I haven't tried yet.
Could you show me where exactly I can read about this? |
I was thinking of this:
|
Right, so the flags for Pillow are unfortunately different. For Pillow you can do this:
|
You can see all the supported flags here: https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#jpeg-2000 |
Thank you @MerlijnWajer this helps.
It works! I'm not getting the error. No spaces allowed. So to address the second part of the initial issue, perhaps you can catch
Turns out pillow is just really quite bad at lower quality settings but cleans up with some better quality. To me it becomes acceptable at around recode_pdf -v --dpi 300 -J pillow \
--fg-compression-flags 'quality_layers:[220]' \
-I in.png --hocr-file in.hocr -o out-pillow-r220.pdf So to address the first part of the initial issue, you could set these as the default fg flags if the user doesn't set otherwise, so users won't think it's broken like I did. 😅 |
I wanted to do a simple PR for the help output, but I'm not sure how so I've added a 3rd checkbox to the initial issue in stead. Right now
There is no space at the start in the examples, and with pillow the space causes the error. I think this text is outdated. Suggestion:
|
Right, pillow flags aren't documented there and those should not start with a space. The thing with the space is that if you do something like Regarding the default pillow/openjpeg flags, could you compare the filesizes? My suspicion is that now the resulting PDFs will be quite a bit larger than the kakadu ones. I tried to have similar file sizes, rather than similar quality (which I agree might not have been the best idea). |
Oh now I get it! It's exclusively for double dashes. That's why
Yes you are correct, 145kb kakadu size vs 210kb pillow size. I understand the rationale for targeting the same size. It's just that pillow doesn't perform acceptably at such low quality, so without usable defaults the user will always have to figure out how to change the default. |
Ok, that is fair enough, I guess that's a sensible reasoning. Reminds me again that maybe having some "compression profiles" makes sense, so like:
And there could also be profiles for specific content, like:
|
Rereading this thread, I think the better solution is to use I will test this, update the documentation, and remove the space stripping hack. |
Yeah, that works:
|
Absolutely. It saves users a lot of time. When the h264 encoder got presets (fast, slow etc) and content profiles (grainy film, cartoon etc) it became a lot more pleasurable to use. Takes a lot of effort to setup though. So if the default makes sense, that's a good start. You may want to take time to hone presets and keep it undocumented until you are happy with the result. |
I have created an issue for this feature request #48 If I add the flags that you recommend, then I think we can close this bug, right? Maybe we should ask for help on the openjpeg mailing list - they might have some tips/advice. |
Here are my recommended tasks from the original issue. Feel free to close this issue.
(with "documentation" I actually meant |
@MerlijnWajer is it possible to re-recode pdf's that were done using pillow with kakadu? Not that it would increase quality, but the images are a lot bigger while at the same time so terrible that the only thing that saves them is the mask. I think re-doing them with kakadu may remove half the filesize with minimal to no quality loss since the images are already so blurry. |
You could try to render them to a page (combining the MRC into a normal page), and then recompressing them. I don't have a tool to do this exact thing, but it should not be too hard with pymupdf. Maybe mutool can just render the final pages to images, and then you can try to recompress them. |
Thank you for pointing me in the right direction. I think I should keep the mask as generated by |
Provided you still have the original input data, wouldn't it be better to use that directly to avoid additional quality loss? |
Absolutely, always, 100%. But the original data is gone and so is the analog paper. It's just that the pillow images are so inconceivably bad compared to their file size, it's like 150kb per page. I think those Maybe the end result is 85 kb versus 150 kb but it adds up if you have scanned and destroyed a lot of material before realizing kakadu was not used by default when available. |
Using
-J pillow
results in a terrible images. It looks like the image is resampled 4 to 1.Here is the
![pillow](https://user-images.githubusercontent.com/1702193/154853050-8eddff7d-3b0e-49ce-af4e-a9255eafb456.png)
-J pillow
foreground layer:For comparison, here is
![kakadu](https://user-images.githubusercontent.com/1702193/154853018-250e7554-22e1-49ba-9d68-c9a6d202228f.png)
-J kakadu
:The resulting files are approximately similar in size. Is
pillow
really absurdly bad, or does it need to get different compression parameters? I wanted to try this out,recode_pdf
doesn't like the documented compression-flags and will throw an error:Additional info
Linux Mint 20.2 AKA Ubuntu 20.04.3
Test scan to experiment with
test_1.png.zip
Suggested actionables
pillow
so quality is reasonable.ValueError
when following the docs.The text was updated successfully, but these errors were encountered: