Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a worker died or was killed while executing a task by an unexpected system error #26

Open
wushixian opened this issue Aug 19, 2021 · 6 comments

Comments

@wushixian
Copy link

I use 4 GPUs to calculate MSA embeddings, but each time the process terminated, the error was raise by ray, the error message is " a worker died or was killed while executing a task by an unexpected system error", the GPU process terminated one by one, I tried several times, I update ray with lastest version, the problem is same. How can I treat the problem?
Thanks!

@wushixian
Copy link
Author

I tried again and just use CPU to calculate embeddings. and I found it still teminated. I check esm document and it is said that some problem with model esm_msa1_t12_100M_UR50S and recommend using esm_msa1b_t12_100M_UR50S, but I can't find where to modify the code to use esm_msa1b_t12_100M_UR50S, could somebody tell me?
thanks.

@delfosseaurelien
Copy link
Collaborator

Hello,

I will add esm_msa1b_t12_100M_UR50S model in few minutes.

@delfosseaurelien
Copy link
Collaborator

I tried again and just use CPU to calculate embeddings. and I found it still teminated. I check esm document and it is said that some problem with model esm_msa1_t12_100M_UR50S and recommend using esm_msa1b_t12_100M_UR50S, but I can't find where to modify the code to use esm_msa1b_t12_100M_UR50S, could somebody tell me?
thanks.

@delfosseaurelien
Copy link
Collaborator

model esm_msa1b_t12_100M_UR50S added.

@delfosseaurelien
Copy link
Collaborator

I use 4 GPUs to calculate MSA embeddings, but each time the process terminated, the error was raise by ray, the error message is " a worker died or was killed while executing a task by an unexpected system error", the GPU process terminated one by one, I tried several times, I update ray with lastest version, the problem is same. How can I treat the problem?
Thanks!

I will check this, it seems there is an issue with Ray.

@wushixian
Copy link
Author

Thank you very much!

@wushixian wushixian reopened this Aug 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants