Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing resources can cause other WUs to restart #329

Open
Frogging101 opened this issue Feb 24, 2025 · 5 comments
Open

Changing resources can cause other WUs to restart #329

Frogging101 opened this issue Feb 24, 2025 · 5 comments

Comments

@Frogging101
Copy link

Frogging101 commented Feb 24, 2025

My observation

  • Changing a nonzero CPU count to zero, or vice versa, will cause GPU units to restart. This wastes progress (potentially a lot of progress, depending on the WU)
  • Disabling a GPU will cause running CPU WUs to stop and start again. This isn't too much of an issue because Gromacs can checkpoint whenever it wants to.
  • Changing a nonzero CPU count to another nonzero value doesn't appear to disrupt anything.
  • I don't have a multi-GPU system, but the behaviour with that case should be examined as well. Disruption to other WUs should be minimized.

Why this affects me

Sometimes I want to stop folding on either my CPU or GPU, to manage heat output or because I need that resource for something else. And in v8, since there are no slots, the only way to do this without pausing everything is to change the resource count.

In v8, we have also lost the ability to set the "finish" state per-resource, which I miss because I can no longer let my GPU finish its current WU and then go idle (i.e. I want it to pause, but would prefer that it finishes first). Ideally, I would like to see a replacement for v7's per-WU pause and finish states. But that is tangential to this issue.

Edit: It was pointed out that splitting the CPU/GPU into resource groups covers the above. Thanks.

cheers

@muziqaz
Copy link
Contributor

muziqaz commented Feb 24, 2025

V8 has Resource Groups, which are similar to Slots on v7

@Frogging101
Copy link
Author

V8 has Resource Groups, which are similar to Slots on v7

Yes, I know :)

@muziqaz
Copy link
Contributor

muziqaz commented Feb 24, 2025

So half of the statements in your original post are not correct.
"Sometimes I want to stop folding on either my CPU or GPU" - yes you can
"in v8, we have also lost the ability to set the "finish" state per-resource" - this is incorrect
"And in v8, since there are no slots" - that is incorrect
Here is my Web UI:
Image
As you can see each device has its own resource group, on each of the computers.
Changing CPU count mid folding is not recommended.

@Frogging101
Copy link
Author

Frogging101 commented Feb 24, 2025

Oh, I see now. Thanks for that tip. I did not realize you could use resource groups that way.

I think it may still be a bug that changing the CPU or GPU count interrupts unrelated slots.

@muziqaz
Copy link
Contributor

muziqaz commented Feb 24, 2025

Yes, the first part of your comment is valid and possible bug:
Changing CPU number restarts GPU folding within the same resource group, or if there are no resource groups, within same computer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants