Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfaults in 32-bit linux while running master python test suite, maybe allocation related #6447

Closed
AllSeeingEyeTolledEweSew opened this issue Sep 2, 2021 · 18 comments

Comments

@AllSeeingEyeTolledEweSew
Copy link
Contributor

AllSeeingEyeTolledEweSew commented Sep 2, 2021

I'm trying to port #6188 to master. I'm getting segfaults running the python test suite, only on 32-bit linux.

#0  listen_socket_t (this=0xf69f25a0 <std::string::_Rep::_S_empty_rep_storage>) at ../../include/libtorrent/aux_/session_impl.hpp:155
#1  construct<libtorrent::aux::listen_socket_t> (__p=0xf69f25a0 <std::string::_Rep::_S_empty_rep_storage>, this=<optimized out>)
    at ../../include/libtorrent/aux_/session_impl.hpp:155
#2  construct<libtorrent::aux::listen_socket_t> (__p=0xf69f25a0 <std::string::_Rep::_S_empty_rep_storage>, __a=...)
    at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/alloc_traits.h:512
#3  _Sp_counted_ptr_inplace<> (__a=..., this=0xf5d022c8) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:551
#4  __shared_count<libtorrent::aux::listen_socket_t, std::allocator<libtorrent::aux::listen_socket_t> > (__a=..., 
    __p=@0x8468be0: 0xf7065558 <vtable for libtorrent::aux::session_impl+8>, this=0x8468be4) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:682
#5  __shared_ptr<std::allocator<libtorrent::aux::listen_socket_t> > (__tag=..., this=0x8468be0) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:1371
#6  shared_ptr<std::allocator<libtorrent::aux::listen_socket_t> > (__tag=..., this=0x8468be0) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr.h:408
#7  allocate_shared<libtorrent::aux::listen_socket_t, std::allocator<libtorrent::aux::listen_socket_t> > (__a=...)
    at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr.h:860
#8  make_shared<libtorrent::aux::listen_socket_t> () at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr.h:876
#9  libtorrent::aux::session_impl::setup_listener(libtorrent::aux::listen_endpoint_t const&, boost::system::error_code&) () at ../../src/session_impl.cpp:1540
#10 0xf6c73022 in libtorrent::aux::session_impl::reopen_listen_sockets(bool) () at ../../src/session_impl.cpp:2083
#11 0xf6c7596c in libtorrent::aux::session_impl::init() () at ../../src/session_impl.cpp:710
#12 0xf6c865ff in libtorrent::aux::session_impl::wrap<void (libtorrent::aux::session_impl::*)()> (this=0x8468be0, 
    f=(void (libtorrent::aux::session_impl::*)(libtorrent::aux::session_impl * const)) 0xf6c75660 <libtorrent::aux::session_impl::init()>) at ../../src/session_impl.cpp:534
#13 0xf6c5a885 in operator() (__closure=<synthetic pointer>) at ../../src/session_impl.cpp:667
#14 invoke<libtorrent::aux::session_impl::start_session()::<lambda()>, libtorrent::aux::session_impl::start_session()::<lambda()> > (context=<synthetic pointer>, 
    function=<synthetic pointer>) at /boost_1_76_0/boost/asio/detail/handler_invoke_helpers.hpp:51
#15 boost::asio::detail::executor_op<libtorrent::aux::session_impl::start_session()::{lambda()#1}, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, std::allocator<void>*, boost::system::error_code const&, unsigned int) () at /boost_1_76_0/boost/asio/detail/executor_op.hpp:70
#16 0xf6bde53d in complete (bytes_transferred=<optimized out>, ec=..., owner=0x842dc38, this=0x8453c28) at /boost_1_76_0/boost/asio/detail/scheduler_operation.hpp:40
#17 do_run_one (ec=..., this_thread=..., lock=..., this=0x842dc38) at /boost_1_76_0/boost/asio/detail/impl/scheduler.ipp:486
#18 boost::asio::detail::scheduler::run (this=0x842dc38, ec=...) at /boost_1_76_0/boost/asio/detail/impl/scheduler.ipp:204
#19 0xf6c18d6a in run (this=<optimized out>) at /boost_1_76_0/boost/asio/impl/io_context.ipp:63
#20 operator() (__closure=0x84560b4) at ../../src/session.cpp:297
#21 __invoke_impl<void, libtorrent::session::start(libtorrent::session_handle::session_flags_t, libtorrent::session_params&&, boost::asio::io_context*)::<lambda()> > (
    __f=<unknown type in /venv/lib/python3.6/site-packages/python_libtorrent-2.0.4-py3.6-linux-i686.egg/libtorrent/__init__.cpython-36m-i386-linux-gnu.so, CU 0x19a2e83, DIE 0x1a3d23c>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/invoke.h:60
#22 __invoke<libtorrent::session::start(libtorrent::session_handle::session_flags_t, libtorrent::session_params&&, boost::asio::io_context*)::<lambda()> > (
    __fn=<unknown type in /venv/lib/python3.6/site-packages/python_libtorrent-2.0.4-py3.6-linux-i686.egg/libtorrent/__init__.cpython-36m-i386-linux-gnu.so, CU 0x19a2e83, DIE 0x1a3d218>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/invoke.h:95
#23 _M_invoke<0> (this=0x84560b4) at /opt/rh/devtoolset-10/root/usr/include/c++/10/thread:264
#24 operator() (this=0x84560b4) at /opt/rh/devtoolset-10/root/usr/include/c++/10/thread:271
#25 std::thread::_State_impl<std::thread::_Invoker<std::tuple<libtorrent::session::start(libtorrent::flags::bitfield_flag<unsigned char, libtorrent::session_flags_tag, void>, libtorrent::session_params&&, boost::asio::io_context*)::{lambda()#1}> > >::_M_run() () at /opt/rh/devtoolset-10/root/usr/include/c++/10/thread:215
#26 0xf6ed5cbd in execute_native_thread_routine ()
   from /venv/lib/python3.6/site-packages/python_libtorrent-2.0.4-py3.6-linux-i686.egg/libtorrent/__init__.cpython-36m-i386-linux-gnu.so
#27 0xf7fb8bbc in start_thread () from /lib/libpthread.so.0

Repro steps:

  • docker run -v /path/to/libtorrent:/lt -it quay.io/pypa/manylinux2014_i686
  • in docker:
$ curl -O https://boostorg.jfrog.io/artifactory/main/release/1.76.0/source/boost_1_76_0.tar.gz
$ tar xvzpf boost_1_76_0.tar.gz
$ cd /boost_1_76_0
$ ./bootstrap.sh
$ ./b2 headers
$ export BOOST_ROOT=/boost_1_76_0
$ export BOOST_BUILD_PATH=/boost_1_76_0/tools/build
$ export PATH="$BOOST_ROOT:$PATH"
$ yum install -y glibc-static
$ /opt/python/cp36-cp36m/bin/python -m venv /venv
$ source /venv/bin/activate
$ cd /lt
$ git checkout master
$ python setup.py build_ext --b2-args=debug-symbols=on install  # installs the module without stripping
$ cd bindings/python
$ python -X dev -m unittest tests/*.py

Notes:

  • I've run many builds on macos, windows, and 64-bit linux, and haven't seen any stack traces
  • I've seen a few different stack traces. The above is the most common one I've seen. All the ones I've seen happen in constructors, so I assume it's an allocation problem and they're all related
  • I've also seen this stack trace in RC_2_0. To reproduce there, you need to:
  • I haven't tested RC_1_2 against the master suite yet.
@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

AllSeeingEyeTolledEweSew commented Sep 2, 2021

@arvidn since the master tests caught a bug in RC_2_0, I take this as some evidence that it would be nice to backport all the python enhancements from master.

If you can show me what to do there, I can put in the work.

@arvidn
Copy link
Owner

arvidn commented Sep 4, 2021

trying to reproduce this, the build step fails with:

error: [Errno 2] No such file or directory: 'b2': 'b2'

I tried yum install boost-build, yum install boost, yum install boost-devel, nothing helped. Which package is boost-build in?

@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

I messed up my repro steps. I downloaded boost from source for my test. I used the same setup as CI (download source, bootstrap.sh, b2 headers).

@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

I think I used 1.76.0

@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

Updated my repro steps

@arvidn
Copy link
Owner

arvidn commented Sep 5, 2021

jfrog doesn't seem to like links like that. They go to some length to require a full browser in order to download.

Anyway, I get this error:

ImportError: /lt/bindings/python/libtorrent.so: wrong ELF class: ELFCLASS64

Even after rebuilding with --b2-args=address-model=32, I still get the same error.

@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

Oh, do you have a libtorrent.so shadowing the extension? the install step should copy the artifact somewhere under /venv. Do you get the same result after git clean -fxd or similar?

@arvidn
Copy link
Owner

arvidn commented Sep 5, 2021

I can reproduce the segfault now. But I can't find a way to analyze it. If I run gdb (in docker) I get permission denied to create the proces. And I can't configure /proc/sys/kernel/core_pattern either, for some reason (it just says "read only filesystem"). So, how can I actually get at the core file?

@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

docker --privileged will let you run gdb within docker.

I got a core file after some test runs, but it may have been due to running with docker --privileged.

@stale
Copy link

stale bot commented Dec 5, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

Now that #6188 is merged to master, it's easier to make a simple PR to see this segfault.

Note that in #6588 I enable 32-bit builds on manylinux, musllinux and windows. This segfault seems to only happen on manylinux 32-bit.

@stale stale bot removed the stale label Dec 7, 2021
@stale
Copy link

stale bot commented Mar 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

@arvidn could you reopen this? this issue still occurred last time I tried the cibuildwheel workflow on 32-bit linux.

@arvidn arvidn reopened this May 12, 2022
@stale stale bot removed the stale label May 12, 2022
@arvidn arvidn added the bug label May 12, 2022
@stale
Copy link

stale bot commented Aug 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

Bump, I confirmed this still exists at least in master. See #7043

@stale stale bot removed the stale label Sep 4, 2022
@arvidn
Copy link
Owner

arvidn commented Sep 4, 2022

I take it the 64 bit build does not have this problem, right?

@AllSeeingEyeTolledEweSew
Copy link
Contributor Author

Correct, or at least I've never seen this failure on 64-bit builds.

@stale
Copy link

stale bot commented Jan 7, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 7, 2023
@stale stale bot closed this as completed Feb 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants