Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource tracker error #782

Open
GitHamza0206 opened this issue Jan 21, 2025 · 3 comments
Open

resource tracker error #782

GitHamza0206 opened this issue Jan 21, 2025 · 3 comments
Labels
bug Something isn't working pdf

Comments

@GitHamza0206
Copy link

Bug

...
/Users/mac/.pyenv/versions/3.11.5/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

Steps to reproduce

...
output_path = os.path.splitext(file_path)[0] + '.md'
EMBED_MODEL_ID = "sentence-transformers/all-MiniLM-L6-v2"

            loader = DoclingLoader(
                    file_path=file_path,
                    export_type=ExportType.MARKDOWN,
                    chunker=HybridChunker(tokenizer=EMBED_MODEL_ID),
                )

            docs = loader.load()

Docling version

...

Python version

...
3.11

@GitHamza0206 GitHamza0206 added the bug Something isn't working label Jan 21, 2025
@workflowsguy
Copy link

I encounter the same issue under Python 3.12 from the command line:

Steps to reproduce
docling -v /Users/guy/Playground/invoice-simple.pdf

Output

INFO:docling.document_converter:Going to convert document batch...
INFO:docling.utils.accelerator_utils:Accelerator device: 'cpu'
INFO:docling.utils.accelerator_utils:Accelerator device: 'cpu'
INFO:docling.utils.accelerator_utils:Accelerator device: 'cpu'
INFO:docling.pipeline.base_pipeline:Processing document invoice-simple.pdf
[1]    39270 segmentation fault  docling -v /Users/guy/Playground/invoice-simple.pdf
/opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

After this, Python crashes

Docling version: 2.15.1
Docling Core version: 2.14.0
Docling IBM Models version: 3.1.2
Docling Parse version: 3.1.0
Python 3.12.8

Installed in separate virtualenv

@PeterStaar-IBM
Copy link
Contributor

@workflowsguy Can you please share an example pdf?

@workflowsguy
Copy link

@PeterStaar-IBM,

this is one of the pdfs I have tried.
But it is not just this one. I have tested it with several pdfs from various sources.
doclingcrashes with every one of them, every time with the error given above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pdf
Projects
None yet
Development

No branches or pull requests

3 participants