High memory usage when reading a specific PDF #1268

arnestockmans-itp · 2025-02-19T12:25:36Z

Describe the bug

When trying to convert PDF to text, I notice very high memory usage for one specific PDF. This might mean there's a memory leak in the library.

To Reproduce

Code to reproduce the issue:

    val reader = PdfReader(contents)
    val extractor = PdfTextExtractor(reader)

    for (i in 1..reader.numberOfPages) {
        val text = extractor.getTextFromPage(i)
        onPageParsed(text)
    }

When reaching page 4 of the attached PDF, I see this uses ~17GB of memory. With other PDFs, this is way lower.

Expected behavior

Not using this much memory

System

OS: macOS, Google Cloud Run
OpenPDF version: 2.0.3

Your real name

Arne Stockmans

Additional context

0a715c43-5b76-4bc7-9d91-38ebae902f97_paper.pdf

Lonzak · 2025-02-19T13:25:24Z

This might mean there's a memory leak in the library.

It might mean that the memory could be used more efficiently but not a memory leak. Or did you see increasing memory which wasn't freed afterwards?

arnestockmans-itp · 2025-02-19T13:27:11Z

This might mean there's a memory leak in the library.

It might mean that the memory could be use more efficiently but not a memory leak. Or did you see increasing memory which wasn't freed afterwards?

Indeed, you're right, I didn't word it correctly there. The memory is freed afterwards

arnestockmans-itp added the bug label Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory usage when reading a specific PDF #1268

High memory usage when reading a specific PDF #1268

arnestockmans-itp commented Feb 19, 2025 •

edited

Loading

Lonzak commented Feb 19, 2025 •

edited

Loading

arnestockmans-itp commented Feb 19, 2025

High memory usage when reading a specific PDF #1268

High memory usage when reading a specific PDF #1268

Comments

arnestockmans-itp commented Feb 19, 2025 • edited Loading

Describe the bug

To Reproduce

Expected behavior

System

Your real name

Additional context

Lonzak commented Feb 19, 2025 • edited Loading

arnestockmans-itp commented Feb 19, 2025

arnestockmans-itp commented Feb 19, 2025 •

edited

Loading

Lonzak commented Feb 19, 2025 •

edited

Loading