Skip to content

Commit

Permalink
Add how-to for removing a node on Exoscale on a cluster with instance…
Browse files Browse the repository at this point in the history
… pools
  • Loading branch information
simu committed Feb 28, 2025
1 parent efcb874 commit ca379b0
Show file tree
Hide file tree
Showing 3 changed files with 140 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
= Remove a worker node (instance pool)

:cloud_provider: exoscale
:kubectl_extra_args: --as=cluster-admin
:delabel_app_nodes: yes

:node-delete-list: ${NODE_TO_REMOVE}
:instance-pool-group: worker
:delete-pvs: old_pv_names

[abstract]
--
Steps to remove a worker node of an OpenShift 4 cluster on https://www.exoscale.com[Exoscale] which uses instance pools.
--

== Starting situation

* You already have a OpenShift 4 cluster on Exoscale
* Your cluster uses instance pools for the worker and infra nodes
* You have admin-level access to the cluster
* You want to remove an existing worker node in the cluster

== High-level overview

* Drain the node
* Then we remove it from Kubernetes.
* Finally we remove the associated VM from the instance pool.

== Prerequisites

include::partial$exoscale/prerequisites.adoc[]

== Prepare local environment

include::partial$exoscale/setup-local-env.adoc[]

== Prepare Terraform environment

include::partial$exoscale/configure-terraform-secrets.adoc[]

include::partial$setup_terraform.adoc[]

== Drain and Remove Node

* Select a node to remove.
With instance pools, we can remove any node.
+
[source,bash]
----
export NODE_TO_REMOVE=<node name>
----

* If you are working on a production cluster, you need to *schedule the node drain for the next maintenance.*
* If you are working on a non-production cluster, you may *drain and remove the node immediately.*

=== Schedule node drain (production clusters)

include::partial$drain-node-scheduled.adoc[]

=== Drain and remove node immediately

include::partial$drain-node-immediately.adoc[]

== Update Cluster Config

. Update cluster config.
+
[source,bash]
----
pushd "inventory/classes/${TENANT_ID}/"
yq eval -i ".parameters.openshift4_terraform.terraform_variables.worker_count -= 1" \
${CLUSTER_ID}.yml
----

. Review and commit
+
[source,bash]
----
# Have a look at the file ${CLUSTER_ID}.yml.
git commit -a -m "Remove worker node from cluster ${CLUSTER_ID}"
git push
popd
----

. Compile and push cluster catalog
+
[source,bash]
----
commodore catalog compile ${CLUSTER_ID} --push -i
----

== Remove VM

include::partial$exoscale/delete-node-vm-instancepool.adoc[]
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@

. Evict the VM(s) from the instance pool
+
[NOTE]
====
We're going through all {instance-pool-group} instance pools to find the pool containing the node(s) to remove.
This ensures that we can apply the step as-is on clusters on dedicated hypervisors which may have multiple {instance-pool-group} instance pools.
====
+
[source,bash,subs="attributes+"]
----
instancepool_names=$(exo compute instance-pool list -Ojson | \
jq --arg ip_group "{instance-pool-group}" -r \
'.[]|select(.name|contains($ip_group))|.name')
for node in $(echo -n {node-delete-list}); do
for pool_name in ${instancepool_names}; do
has_node=$(exo compute instance-pool show "${pool_name}" -Ojson | \
jq --arg node "${node}" -r '.instances|index($node)!=null')
if [ "$has_node" == "true" ]; then
exo compute instance-pool evict "${pool_name}" "${node}" -z "$EXOSCALE_ZONE"
break
fi
done
done
----

. Run Terraform to update the state with the new instance pool size
+
NOTE: There shouldn't be any changes since `instance-pool evict` reduces the instance-pool size by one.
+
NOTE: Ensure that you're still in directory `${WORK_DIR}/catalog/manifests/openshift4-terraform` before executing this command.
+
[source,bash]
----
terraform apply
----

endif::[]
3 changes: 3 additions & 0 deletions docs/modules/ROOT/partials/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
*** xref:oc4:ROOT:how-tos/exoscale/install.adoc[Install]
// Node management
*** xref:oc4:ROOT:how-tos/exoscale/remove_node.adoc[]
*** xref:oc4:ROOT:how-tos/exoscale/remove_node_instancepool.adoc[]
// Storage cluster
*** xref:oc4:ROOT:how-tos/exoscale/add_storage_node.adoc[]
*** xref:oc4:ROOT:how-tos/exoscale/change_storage_node_size.adoc[]
Expand Down Expand Up @@ -169,6 +170,7 @@
** Exoscale
*** xref:oc4:ROOT:how-tos/exoscale/remove_node.adoc[]
*** xref:oc4:ROOT:how-tos/exoscale/remove_node_instancepool.adoc[]
** Google Cloud Platform
*** xref:oc4:ROOT:how-tos/gcp/infrastructure_machineset.adoc[Infrastructure MachineSets]
Expand Down Expand Up @@ -228,6 +230,7 @@
** Exoscale
// Node management
*** xref:oc4:ROOT:how-tos/exoscale/remove_node.adoc[]
*** xref:oc4:ROOT:how-tos/exoscale/remove_node_instancepool.adoc[]
// Storage cluster
*** xref:oc4:ROOT:how-tos/exoscale/add_storage_node.adoc[]
*** xref:oc4:ROOT:how-tos/exoscale/change_storage_node_size.adoc[]
Expand Down

0 comments on commit ca379b0

Please sign in to comment.