Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] document how to use the Threaded parallel scheme #821

Merged
merged 2 commits into from
Jan 22, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 8 additions & 119 deletions docs/src/guides/improve_computational_performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,38 +58,16 @@ SDDP.train(model; cut_type = SDDP.MULTI_CUT)

## Parallelism

SDDP.jl can take advantage of the parallel nature of modern computers to solve problems
across multiple cores.
SDDP.jl can take advantage of the parallel nature of modern computers to solve
problems across multiple threads.

!!! info
We highly recommend that you read the Julia manual's section on [parallel computing](https://docs.julialang.org/en/v1/manual/parallel-computing/).

You can start Julia from a command line with `N` processors using the `-p` flag:
Start Julia from a command line with `N` threads using the `--threads` flag:
```julia
julia -p N
julia --threads N
```

Alternatively, you can use the `Distributed.jl` package:
```julia
using Distributed
Distributed.addprocs(N)
```

!!! warning
Workers **DON'T** inherit their parent's Pkg environment. Therefore, if you started
Julia with `--project=/path/to/environment` (or if you activated an environment from the
REPL), you will need to put the following at the top of your script:
```julia
using Distributed
@everywhere begin
import Pkg
Pkg.activate("/path/to/environment")
end
```

Currently SDDP.jl supports to parallel schemes, [`SDDP.Serial`](@ref) and
[`SDDP.Asynchronous`](@ref). Instances of these parallel schemes should be passed to the
`parallel_scheme` argument of [`SDDP.train`](@ref) and [`SDDP.simulate`](@ref).
Then, pass an instance of [`SDDP.Threaded`](@ref) to the `parallel_scheme`
argument of [`SDDP.train`](@ref) and [`SDDP.simulate`](@ref).

```julia
using SDDP, HiGHS
Expand All @@ -99,95 +77,6 @@ model = SDDP.LinearPolicyGraph(
@variable(sp, x >= 0, SDDP.State, initial_value = 1)
@stageobjective(sp, x.out)
end
SDDP.train(model; iteration_limit = 10, parallel_scheme = SDDP.Asynchronous())
SDDP.simulate(model, 10; parallel_scheme = SDDP.Asynchronous())
```

There is a large overhead for using the asynchronous solver. Even if you choose asynchronous
mode, SDDP.jl will start in serial mode while the initialization takes place. Therefore, in
the log you will see that the initial iterations take place on the master thread (`Proc. ID
= 1`), and it is only after while that the solve switches to full parallelism.

!!! info
Because of the large data communication requirements (all cuts have to be shared with
all other cores), the solution time will not scale linearly with the number of cores.

!!! info
Given the same number of iterations, the policy obtained from asynchronous mode will be
_worse_ than the policy obtained from serial mode. However, the asynchronous solver can
take significantly less time to compute the same number of iterations.

### Data movement

By default, data defined on the master process is not made available to the workers.
Therefore, a model like the following:
```julia
data = 1
model = SDDP.LinearPolicyGraph(stages = 2, lower_bound = 0) do sp, t
@variable(sp, x >= 0, SDDP.State, initial_value = data)
@stageobjective(sp, x.out)
end
```
will result in an `UndefVarError` error like `UndefVarError: data not defined`.

There are three solutions for this problem.

#### Option 1: declare data inside the build function

```julia
model = SDDP.LinearPolicyGraph(stages = 2) do sp, t
data = 1
@variable(sp, x >= 0, SDDP.State, initial_value = 1)
@stageobjective(sp, x)
end
```

#### Option 2: use `@everywhere`

```julia
@everywhere begin
data = 1
end
model = SDDP.LinearPolicyGraph(stages = 2) do sp, t
@variable(sp, x >= 0, SDDP.State, initial_value = 1)
@stageobjective(sp, x)
end
```

#### Option 3: build the model in a function

```julia
function build_model()
data = 1
return SDDP.LinearPolicyGraph(stages = 2) do sp, t
@variable(sp, x >= 0, SDDP.State, initial_value = 1)
@stageobjective(sp, x)
end
end

model = build_model()
```

### Initialization hooks

!!! warning
This is important if you use Gurobi!

[`SDDP.Asynchronous`](@ref) accepts a pre-processing hook that is run on each
worker process _before_ the model is solved. The most useful situation is for
solvers than need an initialization step. A good example is Gurobi, which can
share an environment amongst all models on a worker. Notably, this environment
**cannot** be shared amongst workers, so defining one environment at the top of
a script will fail!

To initialize a new environment on each worker, use the following:

```julia
SDDP.train(
model;
parallel_scheme = SDDP.Asynchronous() do m::SDDP.PolicyGraph
env = Gurobi.Env()
set_optimizer(m, () -> Gurobi.Optimizer(env))
end,
)
SDDP.train(model; iteration_limit = 10, parallel_scheme = SDDP.Threaded())
SDDP.simulate(model, 10; parallel_scheme = SDDP.Threaded())
```
Loading