You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am trying to run pot 3d as mpiexec --bind-to core -np 72 --allow-run-as-root ./pot3d_cpu
but the CPU version of code fails with the following error
WARNING: Open MPI tried to bind a process but failed. This is a
warning only; your job will continue, though performance may
be degraded.
Local host: d2d9328e364d
Application name: ./pot3d_cpu
Error message: failed to bind memory
Location: ../../../../../orte/mca/rtc/hwloc/rtc_hwloc.c:447
--------------------------------------------------------------------------
[1740172275.508283] [d2d9328e364d:80 :0] mm_iface.c:821 UCX ERROR mm_iface failed to allocate receive FIFO
[d2d9328e364d:00080] ../../../../../ompi/mca/pml/ucx/pml_ucx.c:309 Error: Failed to create UCP worker
[d2d9328e364d:00080] [[27290,1],15] selected pml ob1, but peer [[27290,1],0] on d2d9328e364d selected pml ucx
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another. This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used. Your MPI job will now abort.
You may wish to try to narrow down the problem;
* Check the output of ompi_info to see which BTL/MTL plugins are
available.
* Run your application with MPI_THREAD_SINGLE.
* Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
if using MTL-based communications) to see exactly which
communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
[d2d9328e364d:00080] *** An error occurred in MPI_Init_thread
[d2d9328e364d:00080] *** reported by process [1788477441,15]
[d2d9328e364d:00080] *** on a NULL communicator
[d2d9328e364d:00080] *** Unknown error
[d2d9328e364d:00080] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[d2d9328e364d:00080] *** and potentially your MPI job)
[d2d9328e364d:00041] 71 more processes have sent help message help-orte-odls-default.txt / memory not bound
[d2d9328e364d:00041] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Thanks Ron, yes for this particular system I have 72 cores. I think it was the grid size causing the crash. By reducing the grid size it is working fine now.
Hello, I am trying to run pot 3d as
mpiexec --bind-to core -np 72 --allow-run-as-root ./pot3d_cpu
but the CPU version of code fails with the following error
Content of pot3d.dat file
The text was updated successfully, but these errors were encountered: