-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building Lean 4.15-rc1 with default build settings runs into a Stack Overflow in Data/UInt/Lemmas.lean #6434
Comments
After using the suggested build commands, I can fix the stack overflow error: https://build.opensuse.org/package/show/home:nuklly/lean4. I think |
Hi, first a general note, as we've explained in other issues such as #3068, packaging Lean itself is probably not a good idea right now. People require very specific Lean versions for their packages and unless those are present they are going to run into very confusing error messages as we currently have basically no guarantees for backwards compatibility. Now regarding these errors, I (and to my knowledge all other people on the development team) do not work on SUSE or Gentoo and we can compile the project locally just fine on our machines. Those include at least Fedora, NixOS, MacOS and Windows. Furthermore our CI can also build Lean just fine on Ubuntu and some other platforms in various configurations. So there is likely something different about these distros than the one we currently work on mostly. As a reference my local setup is currently |
I'm running gcc (Gentoo 14.2.1_p20241221 p7) 14.2.1 20241221 and my ulimit -s is at 8MiB too. The stack overflow I encountered wasn't related to ulimit, but rather lean's own thread stack size (the value overridable by the I'll do a compile exactly as given on the generic-build-instructions later (probably tomorrow) to see if it yields different results. |
Also
on Gentoo (gcc-13, glibc-2.39-r6), while building the latest commit with another ebuild. (Using system gmp and cadical). Didn't try much. I can provide |
I've meanwhile tried to follow the linked build instructions for 4.15 (which are basically just "run cmake with release config and make"), and building it with my normal user. This build succeeded without issues. I just realized that Can it be that the overflow is tiny, and that changing the optimization flags makes enough of a difference that -O3 runs through, but -O2 does not? Edit: Just tried -O2 with 4.15 now (script record), and it worked too... I'm now confused as to what could be the difference when using portage to build? Edit 2: Building through portage is still failing: |
Changing Complete Edit: Same failure if I commented out |
I'm also no longer sure if At this point I am actually not even sure it is a stack overflow. Since I did manage to repro it, I can now also share the backtrace, in case it helps. The top of the call stack looks like this:
Here is the full call stack, in case it's relevant: |
Hmmm... Sorry, I'm really out of the water here, as I do not usually have to deal with stack size limitations.... A closer look at the $sp value shows that it's likely running into ulimit stack size problems, and that I just misread the core dump last time. The frame below the segfault handler has about 8 MiB offset compared to the main() function's stack frame, soooooo, I'm back at trying to convince portage to increase the default stack size. |
In any case, this ebuild allowed me to compile Lean 4.15 via Portage right now. I only tried it once, but will try to use the same ebuild tomorrow on my laptop, where I have been seeing the same issue. Compared to the ebuild in the original issue report, this now also sets
Can it be that it depends on timing on which thread the crashing function runs? Like, that it can either be the main thread (which has a stack size set by Edit: Tried this ebuild on my laptop now too, and it worked first try. Soo, I think a combination of setting a higher |
Adding these to my ebuild worked for me. |
@soulsource Do you also have a backtrack of a side-thread stack overflow? The one you posted will always be on main so so far I'm thinking this is strictly a |
@soulsource Do you also have a backtrack of a side-thread stack overflow? The one you posted will always be on main so so far I'm thinking this is strictly a `ulimit -s` issue which is outside our control
Sadly not. That also was just a wild guess as to why I managed to
successfully build without setting a higher default stack size limit,
and only passing the -s parameter to lean.
However, if this code is always running in the main thread, I would much
rather assume that I misinterpreted the situation, and that maybe the shell I
had been using has had a higher ulimit -s set too. I have been
experimenting with getting lean to compile before, and it is not too
unlikely that I forgot to relaunch the shell...
I will try a few compiles later today, where only the ulimit -s is set, but
without passing the -s parameter to lean.
|
Hmmm, I now commented out the However, once I added back the So, what I can say is that it seems to always crash in the main thread, and that it still seems to require both, setting a higher ulimit -s and requesting a higher thread-stacksize from lean via the -s parameter.... |
Do you have a stack trace for that?
…On Fri 10 Jan 2025, 18:36 Andreas Grois, ***@***.***> wrote:
Hmmm, I now commented out the -DLEAN_EXTRA_MAKE_OPTS="-s 262144" line in
the ebuild and the build crashed with stack overflow, however it crashed in
Thread 1 (which I assume to be the main thread) at a stack size of 8 MiB,
what I find weird, given that the ebuild does still set ulimit -s 30000000
.
I've retried it 3 times, and every time I ran into a stack overflow in
Thread 1.
However, once I added back the -DLEAN_EXTRA_MAKE_OPTS="-s 262144"
parameter, building worked reliably (as in: 2/2 tries) again...
So, what I can say is that it seems to always crash in the main thread,
and that it still seems to require both, setting a higher ulimit -s and
requesting a higher thread-stacksize from lean via the -s parameter....
—
Reply to this email directly, view it on GitHub
<#6434 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA2URUL2J7X3DDHGELMFTD2KAAKLAVCNFSM6AAAAABUBBIKB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBTGMZDQNZXGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Do you have a stack trace for that?
The call stack is the same as I have posted above.
#6434 (comment)
I can repro and send the dump along with the corresponding binary, if that helps?
|
Back at my PC - here's the full call stack for the crash with a higher Just to confirm what the build environment is, here the build log (script), at the beginning of which you can also find a cat of the ebuild: And here is the core file, and (hopefully) the corresponding binaries: |
Okay, so there are two different call stacks that can overflow the main thread and a side thread, respectively, because your compiler creates stack frames that are larger than any in our CI for some reason. I'm not sure there is anything for us to accommodate on our side here except to document that various systems may require various build adjustments. |
Yeah, I'm afraid that is the best way to handle this. I had quickly thought about changing |
Prerequisites
Please put an X between the brackets as you perform the following steps:
https://github.com/leanprover/lean4/issues
Avoid dependencies to Mathlib or Batteries.
https://live.lean-lang.org/#project=lean-nightly
(You can also use the settings there to switch to “Lean nightly”)
Description
Building Lean 4.15-rc1 from source with the default build settings fails with a stack overflow when compiling
Data/UInt/Lemmas.lean
for stage1.Context
I am not sure if this is specific to my system, or if others might be running into the same issue. Also, not sure why it doesn't happen in CI. Compiling Lean 4.14 with identical settings was working fine on the same machine.
Looking at the call-stack in the crashdump, there are over 3000 recursions of
l_Lean_IR_NormalizeIds_normFnBody
. However, it is not an infinite recursion, as increasing the thread stack size viaLEAN_EXTRA_MAKE_OPTS
allows to work around this build issue.I haven't tested what stack size exactly is needed, but passing
-DLEAN_EXTRA_MAKE_OPTS="-s 262144"
to cmake allowed the build to finish.Depending on the question if anyone else sees this, it might be a good idea to set a higher default stack size while building Lean.
Steps to Reproduce
Expected behavior: The build succeeds
Actual behavior: The build fails with a stack overflow when compiling Data/UInt/Lemmas.lean
Versions
Gentoo Linux, kernel 6.6.62, gcc (Gentoo 14.2.1_p20241116 p3) 14.2.1 20241116
Additional Information
I've used an ebuild to build lean4. Not sure if this is relevant, but for the sake of completeness:
Edit: This is the ebuild that already passes
-DLEAN_EXTRA_MAKE_OPTS="-s 262144"
. To repro the issue with it, please remove the respective line.Impact
Add 👍 to issues you consider important. If others are impacted by this issue, please ask them to add 👍 to it.
The text was updated successfully, but these errors were encountered: