You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expected to be able to reproduce the checkpointing example in the documentation, running all Apptainer commands with a non-privileged user.
Actual behavior
After executing the apptainer checkpoint instance server, the web server running in the instance crashes. Logs from the ~/.apptainer/instances/logs/{host_name}/{usename}/server.err file:
127.0.0.1 - - [02/Sep/2024 10:28:27] "POST / HTTP/1.1" 200 -
127.0.0.1 - - [02/Sep/2024 10:28:32] "GET / HTTP/1.1" 200 -
[2024-09-02T10:28:39.795, 41000, 41003, ERROR] at fileconnlist.cpp:428 in prepareShmList; REASON='JASSERT(fd != -1) failed'
(strerror((*__errno_location ()))) = Read-only file system
area.name = /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
python3.10: Terminating...
Backtrace:
1 jassert_internal::JAssert::~JAssert() in /.singularity.d/libs/libdmtcp.so 0x7f2515e572f1
2 dmtcp::FileConnList::prepareShmList() in /.singularity.d/libs/libdmtcp_ipc.so 0x7f25162c52de
3 dmtcp_FileConnList_EventHook(eDmtcpEvent, _DmtcpEventData_t*) in /.singularity.d/libs/libdmtcp_ipc.so 0x7f25162c68f7
4 dmtcp::PluginManager::eventHook(eDmtcpEvent, _DmtcpEventData_t*) in /.singularity.d/libs/libdmtcp.so 0x7f2515e26e57
5 dmtcp::DmtcpWorker::preCheckpoint() in /.singularity.d/libs/libdmtcp.so 0x7f2515e1dff4
6 in /.singularity.d/libs/libdmtcp.so 0x7f2515e2eab4
7 in /.singularity.d/libs/libdmtcp.so 0x7f2515e30c66
8 in /lib/x86_64-linux-gnu/libpthread.so.0 0x7f2515852fa3
9 clone in /lib/x86_64-linux-gnu/libc.so.6 0x7f25155f506f
Following calls to apptainer checkpoint instance server show the following logs:
INFO: Using checkpoint "example-checkpoint"
Error, computation not in running state. Either a checkpoint is
currently happening or there are no connected processes.
If using the "root" user to run the example, this error doesn't occur, and I'm able to reproduce the example but the restarting part doesn't work reliably (similar to the issue described here).
Steps to reproduce this behavior
Follow the instructions in the documentation. The user running shouldn't be the root user.
DMTCP was installed from source from the tag 3.0.0 in the github repo.
Version of Apptainer
Expected behavior
Expected to be able to reproduce the checkpointing example in the documentation, running all Apptainer commands with a non-privileged user.
Actual behavior
After executing the
apptainer checkpoint instance server
, the web server running in the instance crashes. Logs from the~/.apptainer/instances/logs/{host_name}/{usename}/server.err
file:Following calls to
apptainer checkpoint instance server
show the following logs:If using the "root" user to run the example, this error doesn't occur, and I'm able to reproduce the example but the restarting part doesn't work reliably (similar to the issue described here).
Steps to reproduce this behavior
Follow the instructions in the documentation. The user running shouldn't be the root user.
DMTCP was installed from source from the tag
3.0.0
in the github repo.What OS/distro are you running
How did you install Apptainer
The text was updated successfully, but these errors were encountered: