Disk exhausted after upgrade 1.5.6-1.9.5 #24914

monwolf · 2025-01-22T11:51:53Z

Hi,
After upgrading our nodes from 1.5.6 -> 1.9.5 we observed differences in storage allocated resources.
We have 2 partitions on our hosts

/ for the OS
/var/ for the tasks
this is the storage free:

our data_dir is pointing to /var/nomad . Our config looks like:

region = "gine"
name = "ec2devusfarm02"
log_level = "DEBUG"
leave_on_interrupt = true
leave_on_terminate = true
data_dir = "/var/nomad/data"
bind_addr = "0.0.0.0"
disable_update_check = true
limits {
        https_handshake_timeout   = "10s"
        http_max_conns_per_client = 400
        rpc_handshake_timeout     = "10s"
        rpc_max_conns_per_client  = 400
}
advertise {
    http = "10.121.200.13:4646"
    rpc = "10.121.200.13:4647"
    serf = "10.121.200.13:4648"
}
tls {
  http = true
  rpc  = true
  cert_file = "/opt/nomad/ssl/server.pem"
  key_file = "/opt/nomad/ssl/server-key.pem"
  ca_file = "/opt/nomad/ssl/nomad-ca.pem"
  verify_server_hostname = true
  verify_https_client    = true

}
log_file = "/var/log/nomad/"
log_json = true
log_rotate_max_files = 7
consul {
    address = "127.0.0.1:8500"
    server_service_name = "nomad-server"
    client_service_name = "nomad-client"
    auto_advertise = true
    server_auto_join = true
    client_auto_join = true

    ssl = true
    ca_file = "/opt/consul/ssl/consul-ca.pem"
    cert_file = "/opt/consul/ssl/server.pem"
    key_file = "/opt/consul/ssl/server-key.pem"
    token = "xxxxx"


}
acl {
  enabled = true
}

vault {
    enabled = true
    address = "https://vault.legacy-dev.com:8200/"
    ca_file = "/opt/vault/ssl/vault-ca.pem"
        cert_file = "/opt/vault/ssl/client-vault.pem"
    key_file = "/opt/vault/ssl/client-vault-key.pem"
}

telemetry {
  publish_allocation_metrics = true
  publish_node_metrics       = true
  datadog_address = "localhost:8125"
  disable_hostname = true
  collection_interval = "10s"
}
datacenter = "farm"

client {
    enabled = true
    network_interface = "ens5"
    cni_path = "/opt/cni/bin"
    cni_config_dir = "/etc/cni/net.d/"
}

plugin "docker" {
  config {
    auth {
      config = "/etc/docker/config.json"
    }
    allow_privileged = true
    volumes {
      enabled = true
    }
  }
}

After the upgrade we started to see exhausted disk errors when we tried to schedule a job:

But the node has enough free storage. If we observe the nomad node status:

As you can see, nomad uses / instead of /var to calculate allocable space. /var/ was used in 1.5.6. But in unique attributes is fingerprinting the right FS

How can I solve that? I didn't see in the release notes something related with this.

The text was updated successfully, but these errors were encountered:

monwolf added the type/bug label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk exhausted after upgrade 1.5.6-1.9.5 #24914

Disk exhausted after upgrade 1.5.6-1.9.5 #24914

monwolf commented Jan 22, 2025

Disk exhausted after upgrade 1.5.6-1.9.5 #24914

Disk exhausted after upgrade 1.5.6-1.9.5 #24914

Comments

monwolf commented Jan 22, 2025