Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handling of cgroup driver setting #864

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

z63d
Copy link
Contributor

@z63d z63d commented Feb 19, 2025

fix: #821

Fixes an issue where cgroupfs was always used as the cgroup driver.
Receives SystemdCgroup from containerd and allows you to configure the cgroup driver. (There is a deserialize issue, so we'll use a regular expression to work around it.)
Uses systemd as the default cgroup driver.

@Mossaka
Copy link
Member

Mossaka commented Feb 19, 2025

It looks like one of the test failed with the following error:

Error: failed to stop unit youki-test.scope: dbus error: dbus function call error: Unit youki-test.scope not loaded.

Caused by:
    failed to stop unit youki-test.scope: dbus error: dbus function call error: Unit youki-test.scope not loaded.

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
   1: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
   2: containerd_shim_wasm::testing::WasiTest<WasiInstance>::wait
   3: containerd_shim_wasmedge::wasmedge_tests::test_seccomp::{{closure}}
   4: core::ops::function::FnOnce::call_once
   5: serial_test::serial_code_lock::local_serial_core_with_return
   6: containerd_shim_wasmedge::wasmedge_tests::test_seccomp
   7: containerd_shim_wasmedge::wasmedge_tests::test_seccomp::{{closure}}
   8: core::ops::function::FnOnce::call_once
   9: test::__rust_begin_short_backtrace
  10: test::run_test::{{closure}}
  11: std::sys::backtrace::__rust_begin_short_backtrace
  12: core::ops::function::FnOnce::call_once{{vtable.shim}}
  13: std::sys::pal::unix::thread::Thread::new::thread_start
test wasmedge_tests::test_seccomp ... FAILED
test wasmedge_tests::test_unreachable ... ok

failures:

failures:
    wasmedge_tests::test_seccomp

@z63d z63d force-pushed the fix/cgroup-driver branch from 79004b2 to b7ba5e9 Compare February 19, 2025 09:16
@z63d
Copy link
Contributor Author

z63d commented Feb 19, 2025

hmm...
This seems to have been a temporary issue. I ran it again and it passed without any issues.

But other problems are occurring...
This did not happen when I was testing with K3s.

  Warning  FailedCreatePodSandBox  6s (x21 over 4m47s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: failed to create container: intermediate process error cgroup error: systemd error: dbus error: dbus connection error: ENOENT: No such file or directory. error during cleanup: systemd error: dbus error: dbus connection error: ENOENT: No such file or directory: unknown

@@ -17,5 +17,5 @@ ADD dist/bin/* /usr/local/bin/
ARG RUNTIME
RUN cat <<EOF >> /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.wasm]
runtime_type = "io.containerd.${RUNTIME}.v1"
runtime_path = "io.containerd.${RUNTIME}.v1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't fully follow the discussion in Slack about this, but I would expect runtime_path to be a path, while this is not a path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the kind test fails.
This is what I wanted to do. Set runtime_type to path.
Is this difference okay?

runtime_type = "$PWD/dist/bin/containerd-shim-$1-v1"

runtime_type = "io.containerd.${RUNTIME}.v1"

@z63d z63d force-pushed the fix/cgroup-driver branch from 1a6b2a1 to 4665961 Compare February 19, 2025 12:42
@github-actions github-actions bot removed the T-CI label Feb 19, 2025
Comment on lines 103 to 145
// Deserialization in the following way doesn't work.
// https://github.com/containerd/rust-extensions/blob/shim-v0.8.0/crates/runc-shim/src/runc.rs#L85-L88
// Because req.options gives us the following string: "\u{1a}\u{15}SystemdCgroup = true\n"
// So use regex to work around this issue.
fn parse_systemd_cgroup(input: &str) -> Option<bool> {
let regex: Regex = Regex::new(r#"\u{1a}\u{15}SystemdCgroup = (true|false)\n"#).unwrap();
if let Some(captures) = regex.captures(input) {
if let Some(value_match) = captures.get(1) {
let value_str = value_match.as_str();
return Some(value_str.parse().unwrap_or(true));
}
}
None
}
let systemd_cgroup = match req.options.as_ref() {
Some(any) => match String::from_utf8(any.value.to_vec()) {
Ok(opts_str) => parse_systemd_cgroup(opts_str.as_str()).unwrap_or(true),
Err(_) => true,
},
None => true,
};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind trying this?
It needs 2 new dependencies

prost = "0.13"
toml = "0.8"
        #[derive(Deserialize, Default, Clone, PartialEq)]
        struct Config {
            #[serde(alias = "SystemdCgroup")]
            systemd_cgroup: bool,
        }
        
        fn get_config(options: Option<&Any>) -> anyhow::Result<Config> {
            let Some(opts) = options else {
                return Ok(Default::default());
            };

            ensure!(
                opts.type_url == "runtimeoptions.v1.Options",
                "Invalid options type {}",
                opts.type_url
            );

            #[derive(Message, Clone, PartialEq)]
            struct Options {
                #[prost(string)]
                type_url: String,
                #[prost(string)]
                config_path: String,
                #[prost(string)]
                config_body: String,
            }

            let opts = Options::decode(opts.value.as_slice())?;

            let config = toml::from_str(opts.config_body.as_str()).map_err(|err| {
                Error::InvalidArgument(format!("invalid shim options: {err}"))
            })?;

            Ok(config)
        }

        let config = get_config(req.options.as_ref()).map_err(|err| {
            Error::InvalidArgument(format!("invalid shim options: {err}"))
        })?;
        let systemd_cgroup = config.systemd_cgroup;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Works perfectly!
Would it be better to separate theOptions and Config logic into different files?

@z63d z63d force-pushed the fix/cgroup-driver branch 3 times, most recently from 4665961 to 79bebfc Compare February 20, 2025 13:15
@github-actions github-actions bot removed the T-tests label Feb 20, 2025
@z63d z63d force-pushed the fix/cgroup-driver branch from 79bebfc to 4764514 Compare February 20, 2025 13:20
@z63d
Copy link
Contributor Author

z63d commented Feb 20, 2025

This error occurs randomly.
It doesn't seem to be a specific test.
Although the probability is low, there are cases where it does not occur.

Error: failed to stop unit youki-test.scope: dbus error: dbus function call error: Unit youki-test.scope not loaded.

Caused by:
    failed to stop unit youki-test.scope: dbus error: dbus function call error: Unit youki-test.scope not loaded.

Probably this happens too.
I followed the Makefile but it doesn't work locally so I haven't tried it yet.
https://github.com/containerd/runwasi/actions/runs/13409133082/job/37455738328

hmm... This seems to have been a temporary issue. I ran it again and it passed without any issues.

But other problems are occurring... This did not happen when I was testing with K3s.

  Warning  FailedCreatePodSandBox  6s (x21 over 4m47s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: failed to create container: intermediate process error cgroup error: systemd error: dbus error: dbus connection error: ENOENT: No such file or directory. error during cleanup: systemd error: dbus error: dbus connection error: ENOENT: No such file or directory: unknown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pod level metrics are not being surfaced
3 participants