Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage when reading a specific PDF #1268

Open
arnestockmans-itp opened this issue Feb 19, 2025 · 2 comments
Open

High memory usage when reading a specific PDF #1268

arnestockmans-itp opened this issue Feb 19, 2025 · 2 comments
Labels

Comments

@arnestockmans-itp
Copy link

arnestockmans-itp commented Feb 19, 2025

Describe the bug

When trying to convert PDF to text, I notice very high memory usage for one specific PDF. This might mean there's a memory leak in the library.

To Reproduce

Code to reproduce the issue:

    val reader = PdfReader(contents)
    val extractor = PdfTextExtractor(reader)

    for (i in 1..reader.numberOfPages) {
        val text = extractor.getTextFromPage(i)
        onPageParsed(text)
    }

When reaching page 4 of the attached PDF, I see this uses ~17GB of memory. With other PDFs, this is way lower.

Expected behavior

Not using this much memory

System

  • OS: macOS, Google Cloud Run
  • OpenPDF version: 2.0.3

Your real name

Arne Stockmans

Additional context

0a715c43-5b76-4bc7-9d91-38ebae902f97_paper.pdf

@Lonzak
Copy link
Contributor

Lonzak commented Feb 19, 2025

This might mean there's a memory leak in the library.

It might mean that the memory could be used more efficiently but not a memory leak. Or did you see increasing memory which wasn't freed afterwards?

@arnestockmans-itp
Copy link
Author

This might mean there's a memory leak in the library.

It might mean that the memory could be use more efficiently but not a memory leak. Or did you see increasing memory which wasn't freed afterwards?

Indeed, you're right, I didn't word it correctly there. The memory is freed afterwards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants