grass.script: Allow init to modify only specified environment #3438

wenzeslaus · 2024-02-19T22:56:58Z

The grass.script.setup.init function modifies os.environ. While os.environ is fit for many, if not most, cases, usage in tests and parallel processing is limited and in all cases, os.environ stays partially modified even after session is finished (we don't destroy the runtime environment, i.e., variables such as GISBASE). With this change, init takes env parameter specifying the environment to modify and it modifies that environment. When no env is provided, it still modifies os.environ, so the default behavior is not changed.

This required only few changes to the initialization code, but more changes were needed for the cleanup code. A lot of tests can now take advantage of this functionality and this PR updates some of them. I plan to update all where it is applicable, but will leave as is some others, namely those which use ctypes (situation is more complex there) and grass.jupyter (env parameter is not implemented for most of them).

I'm not decided whether this should go to 8.4 or wait for 8.5. There is plenty of tests covering it, but the cleanup part which needed most changes does not have any coverage. However, if someone gives it a careful read, it might be possible to have this for 8.4. I don't think there is much concern in terms of API because env parameters is what we now have in many functions. The only uncertainty with that is whether it should create its own copy or modify the provided environment.

This includes changes to g.message interface (python/grass/script/core.py) which I intent to submit in a separate PR (#3439). Also, python/grass/script/tests/conftest.py needs a doc fix (done).

Special thanks goes to @echoix for the advice about pytest's monkeypatch which made the test code possible (and simpler in other places).

The function create_location now works without a full session. Internally, it sets up a runtime environment to execute tools (modules) without the connection to a location (project) which is sufficient for executing g.proj. The new test covers the new functionality, but to actually test it, the test needs to run by itself because grass.script.setup.init (still) creates a global session.

…modified and when it is not, copy is up to the user which makes it clear when that actually happens which is longer but more readable, env is keyword-only, gisbase path handling is documented

This makes the message, verbose and other functions consistent with other wrappers around run_command family calls. While not needed for multiple-mapset situations and parallelization, it is necessary when calls with messages are used without global environment being set and only custom environment available in grass.script.setup. The PR aims at providing the interface, not updating all use cases (except the anticipated changes in grass.script.setup). This does not have any test since the API for that is missing. This will be tested indirectly in the future.

python/grass/script/setup.py

python/grass/script/tests/conftest.py

python/grass/script/tests/grass_script_setup_test.py

echoix · 2024-02-19T23:54:41Z

I also have concerns about passing around an object that has all env vars populated. I know in CI, we need to be careful to avoid leaking them to software that doesn't require them, and to not show them in logs, as we wouldn't want to see the tokens and secrets.
I don't know what are the practices in a Python software, but I would like to make sure that we don't accidentally dump them in a test.

Is the CI-security class that sent a PR yesterday would be interested in that question, or have an answer for this?

wenzeslaus · 2024-02-20T02:04:15Z

I also have concerns about passing around an object that has all env vars populated.

One thing is that the object is readily available already in Python:

import os
print(os.environ)

Another thing is that we are passing these objects around already to achieve various tasks such as data management and parallelization:

env = create_environment(...)
run_command(...env=env)

I have a WIP PR #2923 which aims at hiding some of that, but the env would be simply hidden in another object changing tyhe above to:

with gs.setup.init(...) as session:
    tools = Tools(session=session)
    tools.r_surf_gauss(output="surface")

I know in CI, we need to be careful to avoid leaking them to software that doesn't require them, and to not show them in logs, as we wouldn't want to see the tokens and secrets. I don't know what are the practices in a Python software, but I would like to make sure that we don't accidentally dump them in a test.

I don't know anything else to prevent them besides don't print them. The original variables for the process are always available on Linux with proc/{pid}/environ.

Is the CI-security class that sent a PR yesterday would be interested in that question, or have an answer for this?

Diving into this might too much of a rabbit trail, but I'll discuss it with them.

python/grass/script/tests/grass_script_core_location_test.py

python/grass/script/tests/grass_script_setup_test.py

echoix · 2024-02-20T04:32:58Z

I'm not decided whether this should go to 8.4 or wait for 8.5. There is plenty of tests covering it, but the cleanup part which needed most changes does not have any coverage. However, if someone gives it a careful read, it might be possible to have this for 8.4. I don't think there is much concern in terms of API because env parameters is what we now have in many functions. The only uncertainty with that is whether it should create its own copy or modify the provided environment.

From what I'm aware of, we don't have a feature-freeze policy here, and we didn't decide to start the RC process yet. If you're committed to including it and think you'd still have the bandwidth to tune up in time for release, I don't see why it couldn't be there.

Otherwise, if these changes are considered too breaking for this release (I personally think that full-code wide changes of location to project is way more impactful than this), then part of the usage of mocking env vars in tests could be integrated in some way. The only test fixture was to set no env vars though, not setting them/replacing them (that would make pytest tear them down afterwards)

python/grass/script/tests/conftest.py

This makes the _message_, _verbose_ and other functions consistent with other wrappers around _run_command_ family calls. While not needed for multiple-mapset situations and parallelization, it is necessary when calls with messages are used without global environment being set and only a custom (local) environment is available which is the case in grass.script.setup (with #3438). The PR aims at providing the interface, not updating all use cases (it will be applied for grass.script.setup in #3438). This does not have any test since the current API does not allow for writing these test. This will be tested indirectly in the future (e.g. by #3438). Additionally, it fixes documentation for couple other function where the parameter documentation was misleading or wrong.

….message API" This reverts commit 4a6fd68.

wenzeslaus · 2024-05-03T13:43:31Z

This touches a lot of files, but that's just the update to use the better API in tests. The changes are in python/grass/script/setup.py. The direct tests for this are in python/grass/script/tests/grass_script_setup_test.py.

wenzeslaus · 2024-05-03T15:33:46Z

The CI fails with GISBASE being required in certain context. The test code accounts for that already (or at least it should). Locally, I'm not able to reproduce the issue (just as expected for how the test code is written).

wenzeslaus · 2024-05-06T00:28:53Z

Merging either of #3683 and #3688 should indirectly solve the the failing test.

echoix · 2024-05-06T00:33:22Z

I think #3683 is the less controversial of the two, let's start with it.

python/grass/script/tests/grass_script_setup_test.py

wenzeslaus · 2024-05-06T13:29:01Z

I also have concerns about passing around an object that has all env vars populated.

This is using a mechanism what the Python subprocess package is using. We use subprocesses and we need to customize the environment to pass different GRASS-related settings. That's how the whole grass.script works. While Bandit warns about usage of subprocess package (dangerous in certain context) and specific usages of the subprocess package, I'm not aware of a Bandit warning related to usage of the env parameter.

I know in CI, we need to be careful to avoid leaking them to software that doesn't require them,...

All the subprocesses in general get all the variables from parent process. Having env actually allows users to sanitize the environment and let the GRASS subprocesses know only about what they want them to know.

I would like to make sure that we don't accidentally dump them in a test.

Printing env parameter or printing all parameters of run_command functions (which would include env) would be probably a bad idea.

echoix · 2024-05-09T16:23:31Z

With #3689 merged, this PR should be rebased to remove the duplicated changes (that has conflicts)

…rocess

python/grass/script/setup.py

python/grass/script/tests/grass_script_setup_test.py

Co-authored-by: Ondrej Pesek <[email protected]>

…3438) The _grass.script.setup.init_ function modifies os.environ. While os.environ is fit for many, if not most, cases, usage in tests and parallel processing is limited and in all cases, os.environ stays partially modified even after session is finished (we don't destroy the runtime environment, i.e., variables such as GISBASE). With this change, _init_ takes _env_ parameter specifying the environment to modify and it modifies that environment. When no _env_ is provided, it still modifies os.environ, so the default behavior is not changed. This required only few changes to the initialization code, but more changes were needed for the cleanup code. A lot of tests can now take advantage of this functionality and this PR updates some of them. I plan to update all where it is applicable, but will leave as is some others, namely those which use ctypes (situation is more complex there) and _grass.jupyter_ (_env_ parameter is not implemented for most of them). There is plenty of tests covering it, but the cleanup part which needed most changes does not have any coverage. I don't think there is much concern in terms of API because _env_ parameters is what we now have in many functions. The only uncertainty with that is whether it should create its own copy or modify the provided environment. I go with modify because that makes the copy explicit in the call which is more clear (caller or reader is sure a copy is created) and it is consistent with how os.environ is modified.

wenzeslaus added 5 commits February 16, 2024 15:39

Refine the API: env is always modified to avoid confusion when it is …

7ea3486

…modified and when it is not, copy is up to the user which makes it clear when that actually happens which is longer but more readable, env is keyword-only, gisbase path handling is documented

Merge remote-tracking branch 'upstream/main' into env-for-init

a88648d

grass.script: Allow init modify only specified environment

9a278a4

wenzeslaus added this to the 8.4.0 milestone Feb 19, 2024

wenzeslaus self-assigned this Feb 19, 2024

github-actions bot added Python Related code is in Python libraries labels Feb 19, 2024

echoix reviewed Feb 19, 2024

View reviewed changes

python/grass/script/setup.py Show resolved Hide resolved

python/grass/script/tests/conftest.py Outdated Show resolved Hide resolved

python/grass/script/tests/grass_script_setup_test.py Show resolved Hide resolved

Fix doc for conftest [skip ci]

cdf0592

wenzeslaus mentioned this pull request Feb 20, 2024

grass.script: Provide env parameter in the g.message API #3439

Merged

wenzeslaus changed the title ~~grass.script: Create new location without a session~~ grass.script: Allow init to modify only specified environment Feb 20, 2024

Document recommended fixture usage [skip ci]

94eed44

echoix reviewed Feb 20, 2024

View reviewed changes

python/grass/script/tests/grass_script_core_location_test.py Show resolved Hide resolved

python/grass/script/tests/grass_script_setup_test.py Show resolved Hide resolved

echoix reviewed Feb 20, 2024

View reviewed changes

python/grass/script/tests/conftest.py Outdated Show resolved Hide resolved

wenzeslaus mentioned this pull request Feb 26, 2024

grass Python package: hide windows triggered by subprocess.Popen on MS Windows #3457

Closed

wenzeslaus added 3 commits February 29, 2024 20:30

Use session.env in tests for isolation

8f23eb4

Merge remote-tracking branch 'upstream/main' into env-for-init

dbc084e

Use session.env in temporal tests for isolation

cdc9b41

github-actions bot added vector Related to vector data processing temporal Related to temporal data processing module labels Mar 1, 2024

wenzeslaus added 3 commits March 2, 2024 22:57

Revert temporary change "grass.script: Provide env parameter in the g…

a23965c

….message API" This reverts commit 4a6fd68.

Get g.message env changes using merge

e8a443d

Merge test updates on main

e4eedc3

Apply test updates to new tests

0c76cdf

wenzeslaus marked this pull request as ready for review May 3, 2024 13:40

Test that GISBASE is set

791c868

Merge branch 'main' into env-for-init

8de82d1

echoix reviewed May 6, 2024

View reviewed changes

python/grass/script/tests/grass_script_setup_test.py Outdated Show resolved Hide resolved

wenzeslaus mentioned this pull request May 6, 2024

grass.script: Make grass_path keyword only in init #3689

Merged

wenzeslaus modified the milestones: 8.4.0, 8.5.0 May 6, 2024

echoix and others added 4 commits May 15, 2024 06:55

Merge branch 'main' into env-for-init

7b4818a

Merge remote-tracking branch 'upstream/main' into env-for-init

c56330e

Merge remote-tracking branch 'upstream/main' into env-for-init

99f1ad0

More detailed and correctly written test for init in an isolated subp…

8a24759

…rocess

pesekon2 requested changes Jun 18, 2024

View reviewed changes

python/grass/script/setup.py Outdated Show resolved Hide resolved

python/grass/script/tests/grass_script_setup_test.py Outdated Show resolved Hide resolved

Update python/grass/script/setup.py

7ae6883

Co-authored-by: Ondrej Pesek <[email protected]>

pesekon2 approved these changes Jun 19, 2024

View reviewed changes

wenzeslaus merged commit 9564852 into OSGeo:main Jun 19, 2024
24 checks passed

wenzeslaus deleted the env-for-init branch June 19, 2024 12:28

a0x8o mentioned this pull request Jun 27, 2024

CWE 398 a0x8o/grass#8

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grass.script: Allow init to modify only specified environment #3438

grass.script: Allow init to modify only specified environment #3438

wenzeslaus commented Feb 19, 2024 •

edited

Loading

echoix commented Feb 19, 2024

wenzeslaus commented Feb 20, 2024

echoix commented Feb 20, 2024

wenzeslaus commented May 3, 2024

wenzeslaus commented May 3, 2024 •

edited

Loading

wenzeslaus commented May 6, 2024

echoix commented May 6, 2024

wenzeslaus commented May 6, 2024

echoix commented May 9, 2024

grass.script: Allow init to modify only specified environment #3438

grass.script: Allow init to modify only specified environment #3438

Conversation

wenzeslaus commented Feb 19, 2024 • edited Loading

echoix commented Feb 19, 2024

wenzeslaus commented Feb 20, 2024

echoix commented Feb 20, 2024

wenzeslaus commented May 3, 2024

wenzeslaus commented May 3, 2024 • edited Loading

wenzeslaus commented May 6, 2024

echoix commented May 6, 2024

wenzeslaus commented May 6, 2024

echoix commented May 9, 2024

wenzeslaus commented Feb 19, 2024 •

edited

Loading

wenzeslaus commented May 3, 2024 •

edited

Loading