Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Casual RFC: Dynamic host volume plugins #24862

Open
gulducat opened this issue Jan 14, 2025 · 2 comments
Open

Casual RFC: Dynamic host volume plugins #24862

gulducat opened this issue Jan 14, 2025 · 2 comments

Comments

@gulducat
Copy link
Member

This is a copy of an internal RFC, seeking community feedback!

This is how the feature is currently implemented (slated for release in Nomad 1.10). Large-scale re-imaginings of the whole system are likely not to be entertained, but we do want to be responsive to concerns or requests to improve ease of use.

Thank you for your attention!

Background

We are building a feature to allow operators to dynamically create Nomad host volumes. Host volumes today require modifying Nomad agent configuration on a client node and restarting the agent. Dynamic host volumes (DHV) allow users to create volumes via Nomad API.

We will not describe the entire feature here. There was lots of discussion in #15489 and in our own internal RFC process. This document describes only the plugin specification and considerations for plugin authors.

When a user sends a request (via Nomad CLI or API) to create (or delete) a volume, the server forwards the request to an appropriate client agent. The client invokes the plugin declared in the volume specification file. A plugin is an executable file that adheres to the specification described by this document.

Of utmost importance for us is ease of adoption, since we expect this feature to be used in bespoke, artisanal environments. Host volume plugins should be easy for non-developer system administrators to write and maintain.

Within the overall orchestration ecosystem, consider two contrasting approaches to plugin architecture: CSI and CNI.

  • CSI (Container Storage Interface) plugins are long-running gRPC servers, distributed as OCI container images and run as Nomad jobs. While the architecture makes some sense for CSI’s purposes, development and even operation of these plugins is non-trivial. Its specification is complicated.
  • CNI (Container Network Interface) plugins are executable files. Its specification is relatively simple. A plugin receives certain information on stdin and environment variables, and they output their results to stdout.

Within HashiCorp’s ecosystem, generally plugins are built with go-plugin (including Nomad task drivers, Terraform providers, Vault plugins, etc). It has many advantages, but gRPC is certainly more difficult to work with than a bash/powershell/python/etc script, which just about every systems administrator in the world has written extensively.

Proposal

External plugins manage the lifecycle of dynamic host volumes.

Inspired by the relative simplicity of CNI, our DHV plugin interface requires only an executable file that adheres to the specification described here. They may be written in any language, even a simple shell script.

A plugin will be registered with Nomad if:

  • It is an executable file (a script or binary)
  • It is located in a (configurable) host_volume_plugins directory specific to this purpose
  • It responds appropriately to a fingerprint call (described below)

Each volume specification includes a name that must be unique per Nomad client node. The name is used in job scheduling to place allocations, as with host volumes today. Additionally, Nomad assigns a volume ID that is the true unique identifier for the volume -- volume ID is unique across the whole cluster.

Operations

A plugin must implement these operations: fingerprint, create, and delete. For all operations, exit code 0 indicates success.

We pass in the absolutely-required parameters as CLI arguments. They are also passed as environment variables for authors who prefer referring to them by name (and e.g. so that a script may set -u to error by name if they are not set). The environment variables are prefixed with "DHV_" (i.e. Dynamic Host Volume). No parameters are passed in stdin (in contrast to CNI), because while parsing input JSON is not especially complicated, our general data types are flat and simple, so well-suited to environment variables.

fingerprint

Called when a Nomad client agent starts (or is reloaded) to discover valid plugins. The returned "version" is used to register the plugin on the Nomad node for volume scheduling.

CLI arguments: $1=fingerprint

Environment variables:

DHV_OPERATION=fingerprint

Expected response on stdout:
{"version": "0.0.1"}

Requirements:

  • Must complete within 5 seconds, or it will be killed
    • should be much faster, as no actual work should be done
  • "version" output must be valid per the hashicorp/go-version golang package

create

Called when a volume is created with nomad volume create. Also called when the client agent is started (as with an agent restart or host reboot).

CLI Arguments: $1=create $2=/path/to/expected/volume/destination

Environment variables:

DHV_OPERATION=create
DHV_HOST_PATH=/path/to/expected/volume/destination
DHV_NODE_ID={Nomad node ID}
DHV_VOLUME_NAME={name from the volume specification}
DHV_VOLUME_ID={Nomad volume ID}
DHV_CAPACITY_MIN_BYTES={capacity_min from the volume spec}
DHV_CAPACITY_MAX_BYTES={capacity_max from the volume spec}
DHV_PARAMETERS={json of parameters from the volume spec, or: null}
DHV_PLUGIN_DIR={path to directory containing plugins}

Expected response on stdout:
{"path": $HOST_PATH, "bytes": 50000000}

Requirements:

  • Must complete within 60 seconds, or it will be killed
  • Must be idempotent - repeat runs produce the same result
  • It will be run on initial create, and on nomad client agent (re)start
  • Must be safe to run concurrently, per volume name per host

delete

Called when the volume is deleted with nomad volume delete. Also run when an initial volume create operation fails, since it may have been partially completed.

CLI Arguments: $1=delete $2=/path/to/expected/volume/destination

Environment variables:

DHV_OPERATION=delete
DHV_HOST_PATH=/path/to/expected/volume/destination
DHV_NODE_ID={Nomad node ID}
DHV_VOLUME_NAME={name from the volume specification}
DHV_VOLUME_ID={Nomad volume ID}
DHV_PARAMETERS={json of parameters from the volume spec, or: null}
DHV_PLUGIN_DIR={path to directory containing plugins}

Expected response: none; stdout is discarded.

Requirements:

  • Must complete within 60 seconds, or it will be killed
  • Must be idempotent - repeat runs produce the same result
  • May be run after a partial (failed) create operation
  • Must be safe to run concurrently, per volume name per host

General considerations

Plugin authors should consider these details when writing plugins.

  • The plugin is executed as the same user as the nomad agent (likely root).
  • Since volume name is only unique per node, plugins that write into a SAN will need to take care not to delete remote/shared state by name unless they know that are no other volumes with that name. Volume ID is unique cluster-wide, but may not be across a group of federated clusters.
  • Plugin stdout and stderr are exposed as client agent debug logs, so the plugin should not output sensitive information.
  • No retries are attempted automatically on error. The caller of create/delete must retry manually. The plugin may retry internally with its own retry logic, provided it still completes within the deadline.
  • Plugin configuration - There is no mechanism built into Nomad beyond the above specification. As a convention, we suggest placing any necessary configuration file(s) next to the executable plugin in the plugin directory. You may use the DHV_PLUGIN_DIR to refer to the directory.
  • Similarly, if the plugin needs to retain state across invocations (e.g. delete needs some value that was generated during create), then you may store that in the host filesystem, or some external data store of your choosing, perhaps even Nomad variables.
  • In contrast to CSI volumes, there are no mount_options. Per-volume configuration should be set in the volume parameters. Per-node configuration should be in config file(s) as described above.
  • To modify a volume that already exists, set its id in the volume specification and re-issue create. Plugins are expected to handle this appropriately, or error (exit non-0) if they can not. E.g. if the volume size is changed, and the plugin can not modify the actual size, it should exit non-0 to reject the request.
  • Errors from create while restoring a volume during Nomad agent start will not halt the client. The error will be in client logs, and the volume will not be registered as available on the node.

Example plugin

This example is a simple bash script that creates a directory. There is a plugin built into Nomad called "mkdir" that does this, but this serves as a basic example.

The plugin needs to be placed in an appropriate plugin directory, which is configurable on the client and defaults to: <nomad data dir>/host_volume_plugins

$ touch custom-mkdir && chmod +x custom-mkdir
#!/usr/bin/env bash
set -eu

# since we will be running `rm -rf` (frightening),
# check to make sure DHV_HOST_PATH has a uuid shape in it.
# Nomad generates a volume ID and includes it in the path.
validate_path() {
  if [[ ! "$DHV_HOST_PATH" =~ [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} ]]; then
    1>&2 echo "expected uuid-lookin ID in the DHV_HOST_PATH; got: '$DHV_HOST_PATH'"
    return 1
  fi
}

case $DHV_OPERATION in
  fingerprint)
    echo '{"version": "0.0.1"}'
    ;;
  create)
    validate_path || exit 1
    1>&2 echo "creating directory: $DHV_HOST_PATH"
    mkdir -p "$DHV_HOST_PATH"
    # 0 bytes because simple directories are not any particular size
    printf '{"path": "%s", "bytes": 0}' "$DHV_HOST_PATH"
    ;;
  delete)
    validate_path || exit 1
    1>&2 echo "deleting directory: $DHV_HOST_PATH"
    rm -rf "$DHV_HOST_PATH"
    ;;
  *)
    echo "unknown operation: '$DHV_OPERATION'"
    exit 1
    ;;
esac

Setting "custom-mkdir" as the plugin_id in a volume specification will make use of this plugin:

name      = "my-cool-volume"
type      = "host"
plugin_id = "custom-mkdir" # the executable filename in the plugin directory

# only here to illustrate parameters; this becomes a JSON map
# in the DHV_PARAMETERS env var
parameters {
  arbitrary_key = "arbitrary value"
}

# capability is used only for job scheduling;
# it is never passed to the plugin, but is
# required for this volume spec to be valid.
capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}
$ nomad volume create my-cool-volume.hcl
==> Created host volume my-cool-volume with ID e715fdf0-f8b7-a4bd-4565-f17e15b85f09
  ✓ Host volume "e715fdf0" ready

	2025-01-08T16:52:01-05:00
	ID    	  = e715fdf0-f8b7-a4bd-4565-f17e15b85f09
	Name  	  = my-cool-volume
	Namespace = default
	Plugin ID = custom-mkdir
	Node ID   = 01287ad3-d8f2-39b0-17d9-4dda16d1b85a
	Node Pool = default
	Capacity  = 0 B
	State 	  = ready
	Host Path = /opt/nomad/data/alloc_mounts/e715fdf0-f8b7-a4bd-4565-f17e15b85f09

$ ls -alh /opt/nomad/data/alloc_mounts/f455bbff-a9b9-0db3-9a04-7ba685151c6b
total 8.0K
drwxr-xr-x 2 root root 4.0K Jan  8 16:53 .
drwx--x--x 3 root root 4.0K Jan  8 16:53 ..

$ nomad volume delete -type=host f455bbff-a9b9-0db3-9a04-7ba685151c6b
Successfully deleted volume "f455bbff-a9b9-0db3-9a04-7ba685151c6b"!

$ ls -alh /opt/nomad/data/alloc_mounts/f455bbff-a9b9-0db3-9a04-7ba685151c6b
/bin/ls: cannot access '/opt/nomad/data/alloc_mounts/f455bbff-a9b9-0db3-9a04-7ba685151c6b': No such file or directory

Future Work

We may or may not do the following at some point in the future.

  • allow more details in fingerprint response
    • basic capabilities?
    • other metadata?
  • volume health checking
    • currently no notion of health, only initial “ready” signal from create
    • healthcheck command in plugins
  • configurable deadlines
    • some disks may take longer than 60 seconds
  • auto-retries
    • we prefer not to, since some plugins may time out due to getting totally stuck, and Nomad has no way of knowing that
  • a DHV_IN_USE env var or similar when create is called, so a plugin may decide whether to make changes while the volume is in use
  • other spec extensions?
@henrikjohansen
Copy link

henrikjohansen commented Jan 20, 2025

First of all, thank you for all the work that has gone into this! A Nomad native alternative to CSI is highly appreciated :)

Two quick questions that came to mind:

  1. Will telemetry be available for dynamic host volumes? (IIRC there was nothing for manual host volumes)
  2. Will quota utilization metrics be available for ENT customers (like nomad_nomad_quota_utilization_storage*)

Also, would it not make sense to expose additional information via the DHV_ variables, such as namespace and node_pool? It would make it possible to create some coherent structure under DHV_HOST_PATH such as DHV_NODE_POOL/DHV_NAMESPACE/DHV_VOLUME_NAME

@gulducat
Copy link
Member Author

Thanks for the feedback @henrikjohansen!

I think the first two questions are more about the overall feature, where this issue is more specifically about the plugin interface, but I'll try to address all of them.

Will telemetry be available for dynamic host volumes? (IIRC there was nothing for manual host volumes)

What kind of telemetry are you looking for? The /v1/nodes and /v1/node/{id} APIs will include HostVolumes per node as they do today, but dynamic volumes will have a new ID field populated in the response. Aside from that, a /v1/storage?type=host API (and CLI nomad volume status -type=host command) will show all volumes region-wide. Tracking disk usage is not planned, only the total provisioned size (for plugins that support it).

Will quota utilization metrics be available for ENT customers (like nomad_nomad_quota_utilization_storage*)

Doesn't look like this is in code right now, but I'll see about adding it!

Also, would it note make sense to expose additional information via the DHV_ variables, such as namespace and node_pool? It would make it possible to create some coherent structure under DHV_HOST_PATH such as DHV_NODE_POOL/DHV_NAMESPACE/DHV_VOLUME_NAME

Great suggestion! I'll look into wiring up those values: DHV_NAMESPACE and DHV_NODE_POOL.

We had been assuming that DHV_HOST_PATH is the target directory that the plugin creates (using mkfs, mount, etc), which includes the unique-per-region volume ID (e.g. DHV_HOST_PATH="/opt/nomad/data/alloc_mounts/c6d5cdff-649e-1e4b-00bb-024572903116" =~ ..../alloc_mounts/$DHV_VOLUME_ID). Based on your question, it seems like this is a flawed assumption, and we should instead make DHV_HOST_PATH a consistent client-wide value (like /opt/nomad/data/host_volumes/, and let plugins decide what structure to enforce in there (or to ignore it altogether and put volumes wherever they want). This seems more flexible, and still would leave DHV_VOLUME_ID available, if desired. Does that seem right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants