Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threading Issue in Download Handlers #58

Open
abdurrafay-khurram opened this issue Dec 12, 2024 · 0 comments
Open

Threading Issue in Download Handlers #58

abdurrafay-khurram opened this issue Dec 12, 2024 · 0 comments

Comments

@abdurrafay-khurram
Copy link

I'm trying to get the handle_found_object working, here's my code

from typing import Any, Dict
import objaverse.xl as oxl
import os
import pandas as pd

def custom_method(local_path: str, file_identifier: str, sha256: str, metadata: Dict[str, Any]):
    print(f"Object found and downloaded successfully!")
    print(f"Local Path: {local_path}")
    print(f"File Identifier: {file_identifier}")
    print(f"SHA256: {sha256}")
    print(f"Metadata: {metadata}")
    return None

parquet_path = os.path.join(os.getcwd(), 'attribution.parquet')
df = pd.read_parquet(parquet_path)
oxl.download_objects(
    objects=df, 
    download_dir='test', 
    handle_found_object=custom_method,
    processes=1
)

If I don't pass in the handle_found_object callable argument, it is working and downloading the models. However when i use the callback I get the issue:

2024-12-12 17:11:40.025 | INFO     | objaverse.xl.sketchfab:download_objects:508 - Found 0 objects already downloaded
2024-12-12 17:11:40.025 | INFO     | objaverse.xl.sketchfab:download_objects:529 - Downloading 10 new objects across 1 processes
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    from multiprocessing.spawn import spawn_main; spawn_main(parent_pid=27484, pipe_handle=704)
                                                  ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\multiprocessing\spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\multiprocessing\spawn.py", line 131, in _main
    prepare(preparation_data)
    ~~~~~~~^^^^^^^^^^^^^^^^^^
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\multiprocessing\spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                                  run_name="__mp_main__")
  File "<frozen runpy>", line 287, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "e:\New folder\test.py", line 18, in <module>
    oxl.download_objects(
    ~~~~~~~~~~~~~~~~~~~~^
        objects=df,
        ^^^^^^^^^^^
    ...<2 lines>...
        processes=1
        ^^^^^^^^^^^
    )
    ^
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\site-packages\objaverse\xl\__init__.py", line 147, in download_objects   
    source_downloads = downloaders[source].download_objects(
        objects[objects["source"] == source],
    ...<5 lines>...
        **kwargs,
    )
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\site-packages\objaverse\xl\sketchfab.py", line 550, in download_objects  
    with Pool(processes) as pool:
         ~~~~^^^^^^^^^^^
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\multiprocessing\context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
                context=self.get_context())
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\multiprocessing\pool.py", line 215, in __init__
    self._repopulate_pool()
    ~~~~~~~~~~~~~~~~~~~~~^^
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\multiprocessing\pool.py", line 306, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
                                        self._processes,
                                        ^^^^^^^^^^^^^^^^
    ...<3 lines>...
                                        self._maxtasksperchild,
                                        ^^^^^^^^^^^^^^^^^^^^^^^
                                        self._wrap_exception)
                                        ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\multiprocessing\pool.py", line 329, in _repopulate_pool_static
    w.start()
    ~~~~~~~^^
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
                  ~~~~~~~~~~~^^^^^^
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\multiprocessing\context.py", line 337, in _Popen
    return Popen(process_obj)
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\multiprocessing\popen_spawn_win32.py", line 47, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\multiprocessing\spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "C:\Users\khan\scoop\apps\miniconda3\current\envs\sandbox\Lib\multiprocessing\spawn.py", line 140, in _check_not_importing_main        
    raise RuntimeError('''
    ...<16 lines>...
    ''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant