Convert pdf to images #346

sabraMa · 2020-03-18T13:16:32Z

Hello
I have some questions:
1 Could I convert my pdf document( multipages) to png images with OpenPDF ?
2 Is it possible to generate pdfA format ?
3 there is any way to compress exsiting pdf ?

thank you a lot for your help

doobo · 2020-06-30T11:17:10Z

This is my method of rewriting. I hope it will be useful！

     /**
	 * 根据总页数,按照perSize页生成一张长图片的逻辑, 进行拆分
	 * @param in
	 * @param perSize
	 * @return
	 */
	public static List<byte[]> pdfToImage(byte[] in, Integer perSize) {
		List<byte[]> bms = new ArrayList<>();
		try {
			/*图像合并使用参数*/
			// 定义宽度
			int width = 0;
			// 保存一张图片中的RGB数据
			int[] singleImgRGB;
			// 定义高度，后面用于叠加
			int shiftHeight = 0;
			//保存每张图片的像素值
			BufferedImage imageResult = null;
			// 利用PdfBox生成图像
			PDDocument pdDocument = PDDocument.load(in);
			PDFRenderer renderer = new PDFRenderer(pdDocument);
			/*根据总页数, 按照50页生成一张长图片的逻辑, 进行拆分*/
			// 每50页转成1张图片
			perSize = perSize == null? 1: perSize;
			// 总计循环的次数
			int totalCount = getPages(pdDocument.getNumberOfPages(), perSize);
			for (int m = 0; m < totalCount; m++) {
				for (int i = 0; i < perSize; i++) {
					int pageIndex = i + (m * perSize);
					if (pageIndex == pdDocument.getNumberOfPages()) {
						break;
					}
					// 144为图片的dpi，dpi越大，则图片越清晰，图片越大，转换耗费的时间也越多
					BufferedImage image = renderer.renderImageWithDPI(pageIndex, 144, ImageType.RGB);
					int imageHeight = image.getHeight();
					int imageWidth = image.getWidth();
					if (i == 0) {
						//计算高度和偏移量
						//使用第一张图片宽度;
						width = imageWidth;
						// 保存每页图片的像素值
						// 加个判断：如果m次循环后所剩的图片总数小于pageLength，则图片高度按剩余的张数绘制，否则会出现长图片下面全是黑色的情况
						if ((pdDocument.getNumberOfPages() - m * perSize) < perSize) {
							imageResult = new BufferedImage(width, imageHeight * (pdDocument.getNumberOfPages() - m * perSize), BufferedImage.TYPE_INT_RGB);
						} else {
							imageResult = new BufferedImage(width, imageHeight * perSize, BufferedImage.TYPE_INT_RGB);
						}
					} else {
						// 将高度不断累加
						shiftHeight += imageHeight;
					}
					singleImgRGB = image.getRGB(0, 0, width, imageHeight, null, 0, width);
					imageResult.setRGB(0, shiftHeight, width, imageHeight, singleImgRGB, 0, width);
				}
				// image转byte[]
				ByteArrayOutputStream byteArrayOutputStream = new  ByteArrayOutputStream();
				ImageIO.write(imageResult, "png", byteArrayOutputStream);
				byteArrayOutputStream.flush();
				bms.add(byteArrayOutputStream.toByteArray());
				byteArrayOutputStream.close();
				// 写图片
				//File outFile = new File(pdfPath.replace(".pdf", "_" + m + ".jpg"));
				//ImageIO.write(imageResult, "jpg", outFile);
				shiftHeight = 0;
			}
			pdDocument.close();
		} catch (Exception e) {
			log.error("pdf转图片异常", e);
		}
		return bms;
	}

        /*
	 * 计算总页数
	 */
	private static int getPages(int counts, int pageSize) {
		if(counts == 0) {
			return 0;
		} else if (counts <= pageSize) {
			return 1;
		} else if (counts%pageSize!=0) {
			return counts / pageSize + 1;
		} else {
			return counts / pageSize;
		}
	}

andreasrosdal · 2020-06-30T20:22:20Z

Thanks for sharing! Can you please submit this code as a pull request to OpenPDF? Create a new Java class for it. Then we can add this as a new useful high-level function in the library.

asturio · 2021-02-04T18:52:48Z

This is a nice one for anybody wanting to contribute.

Grab the code of Convert pdf to images #346 (comment)
Create some new utility class for that method
clean the code a little bit (my chinese is not that good :-) )
write some Unit-Test
Create a Pull Request

GreenToad · 2021-03-02T17:59:53Z

This code uses Apache PDFBox not LibrePDF

asturio · 2021-03-20T12:25:30Z

Just answering 2 questions:

Yes, OpenPDF can generate PDF/A
I'm not aware of any part of OpenPDF for compressing existing PDFs. There are some other nice (non-Java) tools which can manipulate Postscript and PDF. Maybe Ghostscript is a away to do so.

mluppi · 2021-07-17T18:56:18Z

@asturio The labels good first issue and task could probably be removed from this issue. The code is for Apache PDFBox and I don't see an easy way to just add that functionality. This issue does not need continuous attention as #145 and #152 which have the task label.

fixed issue LibrePDF#589 update issue LibrePDF#346

bhupendersinghh · 2022-01-25T08:13:02Z

Hi @zengleo did you make changes to add this functionality?

ObsisMc · 2022-03-08T12:41:03Z

Hi, is there anyone working on this issue? Can I take it?

andreasrosdal · 2022-06-15T22:21:45Z

Sure, please submit a pull request for this.
@ObsisMc

andreasrosdal added enhancement help wanted labels Jul 2, 2020

asturio added good first issue task labels Feb 4, 2021

zengleo added a commit to zengleo/OpenPDF that referenced this issue Nov 18, 2021

Add files via upload

3968bbc

fixed issue LibrePDF#589 update issue LibrePDF#346

asturio removed good first issue task labels Sep 15, 2022

andreasrosdal closed this as completed Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert pdf to images #346

Convert pdf to images #346

sabraMa commented Mar 18, 2020

doobo commented Jun 30, 2020 •

edited by asturio

Loading

andreasrosdal commented Jun 30, 2020 •

edited

Loading

asturio commented Feb 4, 2021 •

edited

Loading

GreenToad commented Mar 2, 2021

asturio commented Mar 20, 2021

mluppi commented Jul 17, 2021

bhupendersinghh commented Jan 25, 2022

ObsisMc commented Mar 8, 2022

andreasrosdal commented Jun 15, 2022

Convert pdf to images #346

Convert pdf to images #346

Comments

sabraMa commented Mar 18, 2020

doobo commented Jun 30, 2020 • edited by asturio Loading

andreasrosdal commented Jun 30, 2020 • edited Loading

asturio commented Feb 4, 2021 • edited Loading

GreenToad commented Mar 2, 2021

asturio commented Mar 20, 2021

mluppi commented Jul 17, 2021

bhupendersinghh commented Jan 25, 2022

ObsisMc commented Mar 8, 2022

andreasrosdal commented Jun 15, 2022

doobo commented Jun 30, 2020 •

edited by asturio

Loading

andreasrosdal commented Jun 30, 2020 •

edited

Loading

asturio commented Feb 4, 2021 •

edited

Loading