Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use flambda compiler optimization #928

Merged
merged 24 commits into from
Jan 16, 2023

Conversation

mrstanb
Copy link
Member

@mrstanb mrstanb commented Nov 24, 2022

As the title says, we've been working on adding flambda optimization
For this, some changes have been made in files, such as:

  • make.sh
  • goblint.opam.locked
  • README.md

Please note that flambda is only supported in ocamlopt, hence all the flags, passed for compiler optimization, are only passed as ocamlopt_flags

@mrstanb
Copy link
Member Author

mrstanb commented Nov 24, 2022

Quick note about tests and checking if semantics were compromised with flambda:

  • I ran make test once with the original setup (i.e., after running make setup) -> everything went through smoohtly
  • I ran make test once with the flambda setup (i.e., after running make setup-flambda) -> everything went through smoohtly here as well

Hence, I presume that if make test works in both cases, then we shouldn't have messed up anything semantics-wise.

@sim642 sim642 self-requested a review November 24, 2022 08:04
@sim642 sim642 added student-job performance Analysis time, memory usage setup Dependencies, CI, releasing labels Nov 24, 2022
@sim642
Copy link
Member

sim642 commented Nov 24, 2022

Thanks for the PR! There are currently some merge conflicts so the diff also shows some unrelated things that are actually already on master.

Is this just to add the flambda setup or is there already also evidence that flambda indeed provides some speedup?

@michael-schwarz
Copy link
Member

is there already also evidence that flambda indeed provides some speedup?

Iirc there was a gain of about 1/4 for the SQLite amalgamation when testing locally, but I am sure @mrstanb et al will have the exact numbers.

@mrstanb
Copy link
Member Author

mrstanb commented Nov 24, 2022

Thanks for the PR! There are currently some merge conflicts so the diff also shows some unrelated things that are actually already on master.

Is this just to add the flambda setup or is there already also evidence that flambda indeed provides some speedup?

There are indeed some performance improvements. Me and my teammates will make sure to post a follow-up with some stats.

@mrstanb
Copy link
Member Author

mrstanb commented Nov 24, 2022

Iirc there was a gain of about 1/4 for the SQLite amalgamation when testing locally, but I am sure @mrstanb et al will have the exact numbers.

Indeed, we'll make sure to post the different results that we had as a follow-up.

@mrstanb
Copy link
Member Author

mrstanb commented Nov 24, 2022

From my side the stats are as follows:

  • Running Goblint with the base configuration, no optimizations and with the SQLite amalgamation, I got a walltime of 8073.680s
  • Running Goblint with the base configuration, with flambda flag -O3 and with the SQLite amalgamation, I got a walltime of 6877.092s

@adelavais adelavais force-pushed the use-flambda-compiler branch from 685e112 to bb6fcf1 Compare November 24, 2022 09:53
Copy link
Member

@jerhard jerhard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It's interesting to see that activating flambda already has quite some impact on performance.

Now, with this additional compiler option, it would be great to extend our Github workflows with one that nightly runs all the regressoin-tests with the flambda switch. To add such a workflow, you may add a file in .github/workflows that contains the regression job, which is already present in both the locked.yml (this uses a fixed version of compiler and dependencies) and unlocked.yml files.

goblint.opam Outdated Show resolved Hide resolved
@sim642
Copy link
Member

sim642 commented Nov 25, 2022

To add such a workflow, you may add a file in .github/workflows that contains the regression job, which is already present in both the locked.yml (this uses a fixed version of compiler and dependencies) and unlocked.yml files.

There's no need to make a new workflow altogether. It suffices to just add one extra compiler version to the unlocked matrix.
A locked version isn't possible with flambda anyway, because the lock file is locked into a non-flambda compiler.

@mrstanb
Copy link
Member Author

mrstanb commented Dec 4, 2022

I updated the .github/workflows/unlocked.yml file to include the OCaml compiler version 4.14.x+options. However, I'm not sure if this would be sufficient as a change, since I saw that in the Downgrade dependencies step of lower-bounds-downgrade job ocaml-base-compiler is used which is not the same as the compiler version ocaml-variants which we use with flambda.
Hence, my question: should there be some further changes to the unlocked.yml file? Also, should we specify somewhere the ocaml-option-flambda setting as we do in the golbint.opam.locked file? Otherwise, flambda wouldn't be enabled in the compiler and thus we might remain in the same position as originally.

@sim642
Copy link
Member

sim642 commented Dec 4, 2022

I think just +options does nothing on its own. You need to actually specify the options like the setup-ocaml documentation shows: https://github.com/ocaml/setup-ocaml#inputs.

The lower bounds jobs are completely separate and need not be modified.

@nathanschmidt
Copy link
Collaborator

TLDR: After testing out the effects of activating different Flambda flags, we opted for using -O3 without any further options. This leads to a speed-up of approx. 15%. The specification of further inlining options lead to mitigated results, being responsible for alternately improved or worse run-times depending on the system, hence why my teammates and I decided against them.

We started out by running Goblint with different flags over the SQLite Amalgamation. On our private machines, the flag combination -O3 -inline-toplevel=400 -inline-max-depth=1 -inline-max-unroll=0 seemed to be the most promising choice.

To confirm those results, we then proceeded with benchmarking on the CoolMUC-2 HPC of the LRZ in order to achieve reproducible results. Using this configuration, the following measurements were made (format: hours:minutes:seconds):

  • No optimizations: 02:37:47
  • Flambda with -O2: 02:22:28
  • Flambda with -O3: 02:14:31
  • Flambda with -O3 -inline-toplevel=400 -inline-max-depth=1 -inline-max-unroll=0: 02:20:23

The runs on the HPC-cluster confirmed our assumption of a noticeable time benefit of using Flambda. However, using just -O3 was the fastest configuration.

To ultimately take a decision on which flags to use, we continued our benchmarking with the coreutils programs of the bench repository. We activated some ana.int.* domains to see if there was some significant changes in the ordering of the above listed combinations.

On our local systems, we marked down this results: coreutils_benchmarks.md. The fastest configuration, again contradicting what we had observed on the HPC, was -O3 -inline-toplevel=400 -inline-max-depth=1 -inline-max-unroll=0.

We then made some further batch jobs on the LRZ-cluster with ana.int.interval, ana.int.def_ecx, ana.int.enums and ana.int.congruence enabled. Only one CPU was used to achieve greater run-times, which allows for better differentiation between all measurements. The cumulated run-times over all coreutils programs were:

  • No optimizations: 00:03:00
  • Flambda with -O2: 00:02:36
  • Flambda with -O3: 00:02:24
  • Flambda with -O3 -inline-toplevel=400 -inline-max-depth=1 -inline-max-unroll=0: 00:02:33

Once more, on the HPC, the fastest choice was just -O3. Therefore, as no clear advantage of activating the additional inlining options can be established, we decided that it would be wisest to stick with just -O3, which is now reflected in our PR.

goblint.opam.locked Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@jerhard
Copy link
Member

jerhard commented Dec 14, 2022

What are the built times on your machines with the flambda compiler with the regular make and with the release profile (make release) compared to the non-flambda compiler switch? That information would help us decide whether we should only keep the flambda switch, or keep it a separate option.

@nathanschmidt
Copy link
Collaborator

I've benchmarked make release both with Goblint as in the master branch and with flambda -O3 as in this PR. Here are the results for 5 builds each:

No flambda: AVERAGE = 9,777s

  • 9,681s
  • 9,792s
  • 9,812s
  • 9,731s
  • 9,869s

Flambda with -O3: AVERAGE = 26,075s

  • 26,296s
  • 26,001s
  • 25,992s
  • 26,183s
  • 25,903s

@sim642
Copy link
Member

sim642 commented Dec 21, 2022

Thanks! Could you also get the numbers for development build (make)? Hopefully there the slowdown isn't as significant, because it would be slightly annoying.

@nathanschmidt
Copy link
Collaborator

So for make:

No flambda: AVERAGE = 9,175s

  • 9,148s
  • 9,245s
  • 9,092s
  • 9,180s
  • 9,208s

Flambda with -O3: AVERAGE = 13,829s

  • 13,845s
  • 13,765s
  • 13,864s
  • 13,844s
  • 13,828s

@michael-schwarz
Copy link
Member

I think this sort of slowdown for make is acceptable here.

@michael-schwarz
Copy link
Member

I would suggest given these numbers that we should default to flambda. @sim642 @stilscher @jerhard 👍 or 👎 ?

@sim642
Copy link
Member

sim642 commented Dec 21, 2022

Sure, and actually there's probably no need to have -O3 in dev profile anyway because it disables cross-module optimizations. That probably brings the development mode overhead down even further.

@jerhard
Copy link
Member

jerhard commented Dec 21, 2022

Yes, I would also be in favor of making flambda the default and removing the -O3 flag for the dev profile.

@nathanschmidt
Copy link
Collaborator

In my latest commit, I removed the -O3 flag for make, which lead to slightly improved compilation times:

Flambda with no options: AVERAGE = 12,200s

  • 12,487s
  • 11,876s
  • 12,164s
  • 12,482s
  • 11,990s

Furthermore, I made the ocaml-variants.4.14.0+options compiler with flambda enabled default (make setup). To still be able to use the ocaml-base-compiler if wished, I added the make setup-base command.

src/dune Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
make.sh Outdated Show resolved Hide resolved
make.sh Outdated Show resolved Hide resolved
@montrie montrie mentioned this pull request Jan 9, 2023
@sim642 sim642 self-requested a review January 11, 2023 06:50
@sim642 sim642 added this to the v2.2.0 milestone Jan 16, 2023
@sim642 sim642 merged commit 9ba891c into goblint:master Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Analysis time, memory usage setup Dependencies, CI, releasing student-job
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants