Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

post install task is broken with Slurm + hybrid engine #51

Open
KADichev opened this issue Dec 18, 2024 · 1 comment
Open

post install task is broken with Slurm + hybrid engine #51

KADichev opened this issue Dec 18, 2024 · 1 comment

Comments

@KADichev
Copy link
Collaborator

KADichev commented Dec 18, 2024

After doing

srun -p TaiShanV110  -c 64 -t 08:00:00 --pty /bin/bash
< load Spack environment>
make 
make install

post install tasks in make install are broken for hybrid engine, as follows:
image

@KADichev KADichev changed the title post install task is broken for y post install task is broken with Slurm + hybrid engine Dec 18, 2024
@anyzelman
Copy link
Member

On the hybrid postinstall checks specifically: the slurm srun command should ask for (at least 3) tasks. (My last couple of SLURM allocations when I was testing LPF even used 10 tasks -- I think one or two functional tests require that)

Generally: did you export HYDRA_BOOTSTRAP=ssh if MPICH or the other workaround for if OpenMPI? (The slurm deadlock issue on our cluster is not resolved right?) I'm hesitant to acknowledge these as LPF bugs because our cluster's SLURM has been misbehaving-- it's hard / impossible to isolate where the error comes from at present while no errors occur on the other systems tested on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants