Skip to content

Commit

Permalink
Reduce the default jobs/checks per second
Browse files Browse the repository at this point in the history
Don't want to stress Slurm unnecessarily since it will impact all HPC
users
  • Loading branch information
jdblischak committed Sep 17, 2021
1 parent 4f4d80e commit 76b1ed2
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 3 deletions.
24 changes: 23 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
* [Limitations](#limitations)
* [Quick start](#quick-start)
* [Customizations](#customizations)
* [Use speed with caution](#use-speed-with-caution)
* [License](#license)

The option [`--cluster-config`][cluster-config] is deprecated, but it's still
Expand All @@ -32,7 +33,8 @@ post][sichong-post] by Sichong Peng nicely explains this strategy for replacing

* Fast! It can quickly submit jobs and check their status because it doesn't
invoke a Python script for these steps, which adds up when you have thousands
of jobs
of jobs (however, please see the section [Use speed with
caution](#use-speed-with-caution))

* No reliance on the deprecated option `--cluster-config` to customize job
resources
Expand Down Expand Up @@ -282,6 +284,26 @@ documentation below.
latest attempt. Also, please upvote my [PR][pr-multi-cluster] to fix this in
Snakemake.

## Use speed with caution

A big benefit of the simplicity of this profile is the speed in which jobs can
be submitted and their statuses checked. The [official Slurm profile for
Snakemake][slurm-official] provides a lot of extra fine-grained control, but
this is all defined in Python scripts, which then have to be invoked for each
job submission and status check. I needed this speed for a pipeline that had an
aggregation rule that needed to be run tens of thousands of times, and the run
time for each job was under 10 seconds. In this situation, the job submission
rate and status check rate were huge bottlenecks.

However, you should use this speed with caution! On a shared HPC cluster, many
users are making requests to the Slurm scheduler. If too many requests are made
at once, the performance will suffer for all users. If the rules in your
Snakemake pipeline take at least more than a few minutes to complete, then it's
overkill to constantly check the status of multiple jobs in a single second. In
other words, only increase `max-jobs-per-second` and/or
`max-status-checks-per-second` if either the submission rate or status checks to
confirm job completion are clear bottlenecks.

## License

This is all boiler plate code. Please feel free to use it for whatever purpose
Expand Down
4 changes: 2 additions & 2 deletions simple/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ default-resources:
- qos=<name-of-quality-of-service>
- mem_mb=1000
restart-times: 3
max-jobs-per-second: 100
max-status-checks-per-second: 10
max-jobs-per-second: 10
max-status-checks-per-second: 1
local-cores: 1
latency-wait: 60
jobs: 500
Expand Down

0 comments on commit 76b1ed2

Please sign in to comment.