Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows fatal exception: access violation - shapes.py line 65 #209

Open
Day-Go opened this issue May 25, 2022 · 13 comments
Open

Windows fatal exception: access violation - shapes.py line 65 #209

Day-Go opened this issue May 25, 2022 · 13 comments

Comments

@Day-Go
Copy link

Day-Go commented May 25, 2022

Experiencing irregular python interpreter crashes. Crash happens 20-30% of the time that error producing line is executed. Code is using a multiprocessing pool, dont know if that is relevant but thought it might be worth mentioning.

Error found by using pythons in-built faulthandler module. Full traceback below:

Windows fatal exception: access violation

Current thread 0x00000a28 (most recent call first):
  File "C:\Users\OolongJunSun\AppData\Local\Programs\Python\Python310\lib\site-packages\pymunk\shapes.py", line 65 in shapefree
  File "D:\02_Projects\03_Active\Evolution\Crawl-Eat-Die-Repeat-broken\population.py", line 42 in generate_individuals
  File "D:\02_Projects\03_Active\Evolution\Crawl-Eat-Die-Repeat-broken\main.py", line 113 in <module>

Line 42 of my population.py script is resetting a dictionary containing a 500 class instances with variables made up of pymunk objects.

OS: Windows 10
PY: Python 3.10.4

@Day-Go
Copy link
Author

Day-Go commented May 26, 2022

While running the code today i got a similar error with a more detailed stack trace. Looks like the problem is related to multiprocessing.

Windows fatal exception: access violation

Thread 0x0000401c (most recent call first):
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\connection.py", line 310 in _recv_bytes
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\connection.py", line 255 in recv
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 576 in _handle_results
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 917 in run
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 980 in _bootstrap_inner
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 937 in _bootstrap

Thread 0x0000506c (most recent call first):
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 528 in _handle_tasks
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 917 in run
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 980 in _bootstrap_inner
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 937 in _bootstrap

Thread 0x00005464 (most recent call first):
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\connection.py", line 816 in _exhaustive_wait
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\connection.py", line 884 in wait
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 499 in _wait_for_updates
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 519 in _handle_workers
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 917 in run
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 980 in _bootstrap_inner
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 937 in _bootstrap

Current thread 0x000054a8 (most recent call first):
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\site-packages\pymunk\shapes.py", line 65 in shapefree
  File "D:\02_Projects\03_Active\Evolution\Crawl-Eat-Die-Repeat-broken\population.py", line 42 in generate_individuals
  File "D:\02_Projects\03_Active\Evolution\Crawl-Eat-Die-Repeat-broken\main.py", line 113 in <module>

@viblo
Copy link
Owner

viblo commented May 27, 2022

Do you call into Pymunk from multiple threads? Pymunk is not threadsafe, so if you call its methods from multiple threads it might break.

@Day-Go
Copy link
Author

Day-Go commented May 27, 2022

Thanks for the reply and for developing this great library.

I'm not sure what you mean by call into Pymunk but I'll give you a brief step-by-step summary of what the code does.

  1. Initialize a multiprocessing pool and pass in a large dictionary containing class instances made up of Pymunk objects.
  2. Create a pymunk.Space() with each of the processes (there are 6 processes in total)
  3. Populate each space with objects from the dictionary
  4. Enter a loop which steps through the simulation with space.step(dt). Each process has its own independent space.
  5. Return integer from process

At no point do two threads interact with a single space. The processes are independent from each other.

The code is on GitHub - https://github.com/OolongJunSun/Crawl-Eat-Die-Repeat

@viblo
Copy link
Owner

viblo commented May 27, 2022

I tried your code and managed to reproduce the error. To check the threading theory I rewrote the pool to use a for loop instead, but it seems like it can still happen, but maybe less often. More investigation needed.

@Day-Go
Copy link
Author

Day-Go commented May 27, 2022

Really appreciate you going to the effort.

I've found that reducing the number of pymunk objects causes less frequent crashes. When i first reported the problem i had self.n_genes in population.py set to 13. Now i have it set to 9 and the crashes are much less frequent (every 20-50 generations, sometimes 200 generations without crashing).

Other than that I still can't nail down anything that causes the crash to be more or less likely to happen.

@viblo
Copy link
Owner

viblo commented May 27, 2022

What does n_genes control? I mean, what happens to the physics objects?

@Day-Go
Copy link
Author

Day-Go commented May 27, 2022

n_genes corresponds to the number limbs that the randomly generated creature has. By increasing n_genes by 1 you add an additional pymunk.Body(), pymunk.Segment() and between 1-3 pymunk.constraints(). A PivotJoint is always added and either a SimpleMotor and/or a DampedRotarySpring may be added.

It's not much but since we are generating 500 class instances simultaneously it can add up to 2500 more pymunk objects to the dictionary which is causing the error (line 42 of population.py).

I'm hesitant to attribute the size of the dictionary to the error since it can still occur when we process smaller dictionaries, albeit less frequently. The size of the dictionary can be controlled by changing the n_individuals variable in main.py.

@viblo
Copy link
Owner

viblo commented May 27, 2022

Ah, it became a bit easier to validate different theories around this error when I could trigger the error more often with a higher n_genes.
It seems like the error reduces (or completely goes away) when I manually remove the shapes and reset their bodies to None in evaluate_individual, just after the while loop like this:

for shape in env.space.shapes:
    env.space.remove(shape)
    shape.body = None

@Day-Go
Copy link
Author

Day-Go commented May 27, 2022

Fantastic, thank you so much for the help. I'll add your modification and see if the error happens anymore.

Will keep the issue open for another day but hopefully that fixes it.

@Day-Go
Copy link
Author

Day-Go commented May 27, 2022

Error still occurring unfortunately.

@Day-Go
Copy link
Author

Day-Go commented May 27, 2022

Following your example I've added some code to manually set all the pymunk.Body() and pymunk.Segment objects in the error causing dictionary to None.

        for organism in population.cohort.values():
            for limb in organism["instance"].body.structure.values():
                limb["obj"].matter = None
                limb["obj"].shape = None

Would you recommend also setting all of the constraint objects to None?

@Day-Go
Copy link
Author

Day-Go commented May 28, 2022

Error hasn't occurred since manually resetting objects in the dictionary as mentioned above. Looks like the issue was caused by freeing a large number of pymunk shapes from memory simultaneously.

@Day-Go Day-Go closed this as completed May 28, 2022
@viblo
Copy link
Owner

viblo commented May 28, 2022

Ok, great that you found a workaround! I will still keep this issue open for a while, while I research it a bit more.

@viblo viblo reopened this May 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants