Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert pdf to images #346

Closed
sabraMa opened this issue Mar 18, 2020 · 9 comments
Closed

Convert pdf to images #346

sabraMa opened this issue Mar 18, 2020 · 9 comments

Comments

@sabraMa
Copy link

sabraMa commented Mar 18, 2020

Hello
I have some questions:
1 Could I convert my pdf document( multipages) to png images with OpenPDF ?
2 Is it possible to generate pdfA format ?
3 there is any way to compress exsiting pdf ?

thank you a lot for your help

@doobo
Copy link

doobo commented Jun 30, 2020

This is my method of rewriting. I hope it will be useful!

     /**
	 * 根据总页数,按照perSize页生成一张长图片的逻辑, 进行拆分
	 * @param in
	 * @param perSize
	 * @return
	 */
	public static List<byte[]> pdfToImage(byte[] in, Integer perSize) {
		List<byte[]> bms = new ArrayList<>();
		try {
			/*图像合并使用参数*/
			// 定义宽度
			int width = 0;
			// 保存一张图片中的RGB数据
			int[] singleImgRGB;
			// 定义高度,后面用于叠加
			int shiftHeight = 0;
			//保存每张图片的像素值
			BufferedImage imageResult = null;
			// 利用PdfBox生成图像
			PDDocument pdDocument = PDDocument.load(in);
			PDFRenderer renderer = new PDFRenderer(pdDocument);
			/*根据总页数, 按照50页生成一张长图片的逻辑, 进行拆分*/
			// 每50页转成1张图片
			perSize = perSize == null? 1: perSize;
			// 总计循环的次数
			int totalCount = getPages(pdDocument.getNumberOfPages(), perSize);
			for (int m = 0; m < totalCount; m++) {
				for (int i = 0; i < perSize; i++) {
					int pageIndex = i + (m * perSize);
					if (pageIndex == pdDocument.getNumberOfPages()) {
						break;
					}
					// 144为图片的dpi,dpi越大,则图片越清晰,图片越大,转换耗费的时间也越多
					BufferedImage image = renderer.renderImageWithDPI(pageIndex, 144, ImageType.RGB);
					int imageHeight = image.getHeight();
					int imageWidth = image.getWidth();
					if (i == 0) {
						//计算高度和偏移量
						//使用第一张图片宽度;
						width = imageWidth;
						// 保存每页图片的像素值
						// 加个判断:如果m次循环后所剩的图片总数小于pageLength,则图片高度按剩余的张数绘制,否则会出现长图片下面全是黑色的情况
						if ((pdDocument.getNumberOfPages() - m * perSize) < perSize) {
							imageResult = new BufferedImage(width, imageHeight * (pdDocument.getNumberOfPages() - m * perSize), BufferedImage.TYPE_INT_RGB);
						} else {
							imageResult = new BufferedImage(width, imageHeight * perSize, BufferedImage.TYPE_INT_RGB);
						}
					} else {
						// 将高度不断累加
						shiftHeight += imageHeight;
					}
					singleImgRGB = image.getRGB(0, 0, width, imageHeight, null, 0, width);
					imageResult.setRGB(0, shiftHeight, width, imageHeight, singleImgRGB, 0, width);
				}
				// image转byte[]
				ByteArrayOutputStream byteArrayOutputStream = new  ByteArrayOutputStream();
				ImageIO.write(imageResult, "png", byteArrayOutputStream);
				byteArrayOutputStream.flush();
				bms.add(byteArrayOutputStream.toByteArray());
				byteArrayOutputStream.close();
				// 写图片
				//File outFile = new File(pdfPath.replace(".pdf", "_" + m + ".jpg"));
				//ImageIO.write(imageResult, "jpg", outFile);
				shiftHeight = 0;
			}
			pdDocument.close();
		} catch (Exception e) {
			log.error("pdf转图片异常", e);
		}
		return bms;
	}

        /*
	 * 计算总页数
	 */
	private static int getPages(int counts, int pageSize) {
		if(counts == 0) {
			return 0;
		} else if (counts <= pageSize) {
			return 1;
		} else if (counts%pageSize!=0) {
			return counts / pageSize + 1;
		} else {
			return counts / pageSize;
		}
	}

@andreasrosdal
Copy link
Contributor

andreasrosdal commented Jun 30, 2020

Thanks for sharing! Can you please submit this code as a pull request to OpenPDF? Create a new Java class for it. Then we can add this as a new useful high-level function in the library.

@asturio
Copy link
Member

asturio commented Feb 4, 2021

This is a nice one for anybody wanting to contribute.

  • Grab the code of Convert pdf to images #346 (comment)
  • Create some new utility class for that method
  • clean the code a little bit (my chinese is not that good :-) )
  • write some Unit-Test
  • Create a Pull Request

@GreenToad
Copy link

This code uses Apache PDFBox not LibrePDF

@asturio
Copy link
Member

asturio commented Mar 20, 2021

Just answering 2 questions:

  1. Yes, OpenPDF can generate PDF/A
  2. I'm not aware of any part of OpenPDF for compressing existing PDFs. There are some other nice (non-Java) tools which can manipulate Postscript and PDF. Maybe Ghostscript is a away to do so.

@mluppi
Copy link
Contributor

mluppi commented Jul 17, 2021

@asturio The labels good first issue and task could probably be removed from this issue. The code is for Apache PDFBox and I don't see an easy way to just add that functionality. This issue does not need continuous attention as #145 and #152 which have the task label.

zengleo added a commit to zengleo/OpenPDF that referenced this issue Nov 18, 2021
@bhupendersinghh
Copy link

Hi @zengleo did you make changes to add this functionality?

@ObsisMc
Copy link

ObsisMc commented Mar 8, 2022

Hi, is there anyone working on this issue? Can I take it?

@andreasrosdal
Copy link
Contributor

Sure, please submit a pull request for this.
@ObsisMc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants