Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "cross-emulation" support #63

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/1-installing.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ To also set up crun-vm for use with Docker:
1. Install crun-vm's runtime dependencies:

```console
$ dnf install bash coreutils crun genisoimage grep libselinux-devel libvirt-client libvirt-daemon-driver-qemu libvirt-daemon-log openssh-clients qemu-img qemu-system-x86-core shadow-utils util-linux virtiofsd
$ dnf install bash coreutils crun crun-krun genisoimage grep libselinux-devel libvirt-client libvirt-daemon-driver-qemu libvirt-daemon-log openssh-clients qemu-img qemu-system-x86-core sed shadow-utils util-linux virtiofsd
```

2. Install Rust and Cargo if you do not already have Rust tooling available:
Expand Down
28 changes: 26 additions & 2 deletions docs/2-podman-docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,25 @@ in a container image.
Note that flag `--persistent` has no effect when running VMs from container
images.

### From bootable container images

crun-vm can also work with [bootable container images], which are containers
that package a full operating system:

```console
$ podman run \
--runtime crun-vm \
-it --rm \
quay.io/fedora/fedora-bootc:40
```

Internally, crun-vm generates a VM image from the bootable container and then
boots it.

By default, the VM image is given a disk size roughly double the size of the
bootc container image. To change this, use the `--bootc-disk-size <size>[KMGT]`
option.

## First-boot customization

### cloud-init
Expand Down Expand Up @@ -317,8 +336,12 @@ $ podman run \
### System emulation

To use system emulation instead of hardware-assisted virtualization, specify the
`--emulated` flag. Without this flag, attempting to create a VM on a host tbat
doesn't support KVM will fail.
`--emulated` flag. Without this flag, attempting to create a VM from a guest
with a different architecture from the host's or on a host that doesn't support
KVM will fail.

It's not currently possible to use this flag when the container image is a bootc
bootable container.
Copy link

@ericcurtin ericcurtin May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the bootable containers blocker at present for --emulated? Just curious...

I wonder could we apply --emulated flag automatically if we detect we are not on the same CPU architecture, I think podman qemu-user-static functionality does this and it makes sense.

There's another use-case emulated could be potentially useful, when EL2/KVM (or /dev/kvm) is not available, but it's probably not worth automatically applying --emulated in that case because it's so much slower than kvm or using containers with just plain-old crun.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the bootable containers blocker at present for --emulated? Just curious...

We currently use libkrun to run a micro VM that generates the VM disk image from the bootable container. Since libkrun relies on KVM, this only works if the bootable container's arch is the same as the host's. We could potentially use some qemu-based micro VM alternative instead of libkrun to lift this limitation.

I wonder could we apply --emulated flag automatically if we detect we are not on the same CPU architecture, I think podman qemu-user-static functionality does this and it makes sense.

This could make sense. It would make --emulated's meaning more complicated and less intuitive, though. Right now the user knows that --emulated = use emulation, no --emulated = use KVM, without having to think about what the host and VM arches are.

There's another use-case emulated could be potentially useful, when EL2/KVM (or /dev/kvm) is not available, but it's probably not worth automatically applying --emulated in that case because it's so much slower than kvm or using containers with just plain-old crun.

Yes, I think it's best to fail here to alert the user to the fact that KVM is not available, instead of silently using slow emulation.


### Inspecting and customizing the libvirt domain XML

Expand All @@ -340,6 +363,7 @@ be merged with it using the non-standard option `--merge-libvirt-xml <file>`.
> Before using this flag, consider if you would be better served using libvirt
> directly to manage your VM.

[bootable container images]: https://containers.github.io/bootable/
[cloud-init]: https://cloud-init.io/
[domain XML definition]: https://libvirt.org/formatdomain.html
[Ignition]: https://coreos.github.io/ignition/
Expand Down
88 changes: 88 additions & 0 deletions embed/bootc/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
{
"ociVersion": "1.0.0",
"process": {
"terminal": true,
"user": { "uid": 0, "gid": 0 },
"args": ["/output/entrypoint.sh", "<IMAGE_NAME>"],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm"
],
"cwd": "/",
"capabilities": {
"bounding": [],
"effective": [],
"inheritable": [],
"permitted": [],
"ambient": []
},
"rlimits": [
{
"type": "RLIMIT_NOFILE",
"hard": 262144,
"soft": 262144
}
],
"noNewPrivileges": true
},
"root": {
"path": "<ORIGINAL_ROOT>",
"readonly": false
},
"hostname": "bootc-install",
"mounts": [
{
"type": "bind",
"source": "<PRIV_DIR>/root/crun-vm/bootc",
"destination": "/output",
"options": ["bind", "rprivate", "rw"]
},
{
"destination": "/proc",
"type": "proc",
"source": "proc"
},
{
"destination": "/dev/pts",
"type": "devpts",
"source": "devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620",
"gid=5"
]
}
],
"linux": {
"namespaces": [
{ "type": "pid" },
{ "type": "network" },
{ "type": "ipc" },
{ "type": "uts" },
{ "type": "cgroup" },
{ "type": "mount" }
],
"maskedPaths": [
"/proc/acpi",
"/proc/asound",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/sys/firmware",
"/proc/scsi"
],
"readonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
}
}
51 changes: 51 additions & 0 deletions embed/bootc/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0-or-later

set -e

image_name=$1

# monkey-patch loopdev partition detection, given we're not running systemd
# (bootc runs `udevadm settle` as a way to wait until loopdev partitions are
# detected; we hijack that call and use partx to set up the partition devices)

original_udevadm=$( which udevadm )

mkdir -p /output/bin

cat >/output/bin/udevadm <<EOF
#!/bin/sh
${original_udevadm@Q} "\$@" && partx --add /dev/loop0
EOF

chmod +x /output/bin/udevadm

# default to an xfs root file system if there is no bootc config (some images
# don't currently provide any, for instance quay.io/fedora/fedora-bootc:40)

if ! find /usr/lib/bootc/install -mindepth 1 -maxdepth 1 | read; then
# /usr/lib/bootc/install is empty

cat >/usr/lib/bootc/install/00-crun-vm.toml <<EOF
[install.filesystem.root]
type = "xfs"
EOF

fi

# build disk image using bootc-install

PATH=/output/bin:$PATH bootc install to-disk \
--source-imgref docker-archive:/output/image.docker-archive \
--target-imgref "$image_name" \
--skip-fetch-check \
--generic-image \
--via-loopback \
--karg console=tty0 \
--karg console=ttyS0 \
--karg selinux=0 \
/output/image.raw

# communicate success by creating a file, since krun always exits successfully

touch /output/bootc-install-success
127 changes: 127 additions & 0 deletions embed/bootc/prepare.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0-or-later

set -o errexit -o pipefail -o nounset

engine=$1
container_id=$2
original_root=$3
priv_dir=$4
disk_size=$5

__step() {
printf "\033[36m%s\033[0m\n" "$*"
}

bootc_dir=$priv_dir/root/crun-vm/bootc

mkfifo "$bootc_dir/progress"
exec > "$bootc_dir/progress" 2>&1

# this blocks here until the named pipe above is opened by entrypoint.sh

# get info about the container *image*

image_info=$(
"$engine" container inspect \
--format '{{.Config.Image}}'$'\t''{{.Image}}' \
"$container_id"
)

image_name=$( cut -f1 <<< "$image_info" )
# image_name=${image_name#sha256:}

image_id=$( cut -f2 <<< "$image_info" )

# check if VM image is cached

container_name=crun-vm-$container_id

cache_image_label=containers.crun-vm.from=$image_id
cache_image_id=$( "$engine" images --filter "label=$cache_image_label" --format '{{.Id}}' )

if [[ -n "$cache_image_id" ]]; then

# retrieve VM image from cached containerdisk

__step "Retrieving cached VM image..."

trap '"$engine" rm --force "$container_name" >/dev/null 2>&1 || true' EXIT

"$engine" create --quiet --name "$container_name" "$cache_image_id" >/dev/null
"$engine" export "$container_name" | tar -C "$bootc_dir" -x image.qcow2
"$engine" rm "$container_name" >/dev/null 2>&1

trap '' EXIT

else

__step "Converting $image_name into a VM image..."

# save container *image* as an archive

echo -n 'Preparing container image...'

"$engine" save --output "$bootc_dir/image.docker-archive" "$image_id" 2>&1 \
| sed -u 's/.*/./' \
| stdbuf -o0 tr -d '\n'

echo

# adjust krun config

__sed() {
sed -i "s|$1|$2|" "$bootc_dir/config.json"
}

__sed "<IMAGE_NAME>" "$image_name"
__sed "<ORIGINAL_ROOT>" "$original_root"
__sed "<PRIV_DIR>" "$priv_dir"

# run bootc-install under krun

if [[ -z "$disk_size" ]]; then
container_image_size=$(
"$engine" image inspect --format '{{.VirtualSize}}' "$image_id"
)

# use double the container image size to allow for in-place updates
disk_size=$(( container_image_size * 2 ))

# round up to 1 MiB
alignment=$(( 2**20 ))
disk_size=$(( (disk_size + alignment - 1) / alignment * alignment ))
fi

truncate --size "$disk_size" "$bootc_dir/image.raw"

trap 'krun delete --force "$container_name" >/dev/null 2>&1 || true' EXIT
krun run --config "$bootc_dir/config.json" "$container_name" </dev/ptmx
trap '' EXIT

[[ -e "$bootc_dir/bootc-install-success" ]]

# convert image to qcow2 to get a lower file length

qemu-img convert -f raw -O qcow2 "$bootc_dir/image.raw" "$bootc_dir/image.qcow2"
rm "$bootc_dir/image.raw"

# cache VM image file as containerdisk

__step "Caching VM image as a containerdisk..."

id=$(
"$engine" build --quiet --file - --label "$cache_image_label" "$bootc_dir" <<-'EOF'
FROM scratch
COPY image.qcow2 /
ENTRYPOINT ["no-entrypoint"]
EOF
)

echo "Stored as untagged container image with ID $id"

fi

__step "Booting VM..."

touch "$bootc_dir/success"
18 changes: 18 additions & 0 deletions scripts/entrypoint.sh → embed/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ trap 'exit 143' SIGTERM

set -o errexit -o pipefail -o nounset

is_bootc_container=$1

# clean up locks that may have been left around from the container being killed
rm -fr /var/lock

Expand Down Expand Up @@ -53,6 +55,22 @@ virsh --connect "qemu+unix:///session?socket=$socket" "\$@"
EOF
chmod +x /crun-vm/virsh

# wait until VM image is generated from bootable container (if applicable)

if (( is_bootc_container == 1 )) && [[ ! -e /crun-vm/image/image ]]; then

fifo=/crun-vm/bootc/progress
while [[ ! -e "$fifo" ]]; do sleep 0.2; done
cat "$fifo"
rm "$fifo"

[[ -e /crun-vm/bootc/success ]]

mkdir -p /crun-vm/image
mv /crun-vm/bootc/image.qcow2 /crun-vm/image/image

fi

# launch VM

function __bg_ensure_tty() {
Expand Down
File renamed without changes.
File renamed without changes.
1 change: 1 addition & 0 deletions plans/tests.fmf
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ prepare:
- cargo
- coreutils
- crun
- crun-krun
- docker
- genisoimage
- grep
Expand Down
Loading