Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various fixes #619

Merged
merged 8 commits into from
Feb 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions automation-generators/generic/cp4d/preprocessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,9 +226,7 @@ def preprocessor(attributes=None, fullConfig=None, moduleVariables=None):

g('project').isRequired()
g('openshift_cluster_name').expandWith('openshift[*]',remoteIdentifier='name')
openshift_cluster_name=g('openshift_cluster_name').getExpandedAttributes()['openshift_cluster_name']
g('cp4d_version').isRequired()
g('openshift_storage_name').expandWithSub('openshift', remoteIdentifier='name', remoteValue=openshift_cluster_name, listName='openshift_storage',listIdentifier='storage_name')
g('cartridges').isRequired()
g('use_case_files').isOptional().mustBeOneOf([True, False])
g('sequential_install').isOptional().mustBeOneOf([True, False])
Expand All @@ -237,6 +235,11 @@ def preprocessor(attributes=None, fullConfig=None, moduleVariables=None):
g('cp4d_entitlement').isOptional().mustBeOneOf(['cpd-enterprise', 'cpd-standard'])
g('cp4d_production_license').isOptional().mustBeOneOf([True, False])

# Expand storage if no errors yet
if len(g.getErrors()) == 0:
openshift_cluster_name=g('openshift_cluster_name').getExpandedAttributes()['openshift_cluster_name']
g('openshift_storage_name').expandWithSub('openshift', remoteIdentifier='name', remoteValue=openshift_cluster_name, listName='openshift_storage',listIdentifier='storage_name')

# Now that we have reached this point, we can check the attribute details if the previous checks passed
if len(g.getErrors()) == 0:
fc = g.getFullConfig()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ def expandWith(self, matchPattern, remoteIdentifier='name'):
self.attributesDict[ self.recentCheck.get('pathToCheck') ]=listOfMatches[0]
else:
#print(listOfMatches)
self.appendError(msg="Can't expand, result of given path ("+ matchPatternCombined +") not unique, found:" + ','.join(listOfMatches))
self.appendError(msg="Can't expand attribute "+self.recentCheck.get('pathToCheck')+", resource to infer from: " + matchPatternCombined + ") not unique, found: " + ','.join(listOfMatches))
return self

# matchPattern (string): should look like 'vpc[*].name'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,6 @@
path: /tmp/work/offline
state: absent

- name: Create offline directory
file:
path: /tmp/work/offline
state: absent

- name: If air-gapped, copy case files from {{ status_dir }}/cp4d/offline to /tmp/work/offline
copy:
src: "{{ status_dir }}/cp4d/offline"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
- (current_cp4d_cluster.image_registry_name | default("")) != ""
- not (cpd_skip_mirror | bool)

- name: Migrate to private topology if upgrading to CP4D 4.7.0 or higher
- name: Migrate to private topology on OpenShift cluster {{ current_cp4d_cluster.openshift_cluster_name }} if upgrading to CP4D 4.7.0 or higher
include_role:
name: cp4d-migrate-private-topology
vars:
Expand All @@ -30,7 +30,7 @@
- _installed_ibmcpd_version < "4.7.0"
- current_cp4d_cluster.cp4d_version >= "4.7.0"

- name: Activate license service and certificate manager for CP4D 4.7.0 and higher
- name: Activate license service and certificate manager on OpenShift cluster {{ current_cp4d_cluster.openshift_cluster_name }}
include_role:
name: cp-fs-cluster-components
vars:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
fail: msg="cloud_platform {{ cloud_platform }} is not implemented, current implemented cloud platforms are {{ implemented_cloud_platform_types }} "
when: "cloud_platform not in implemented_cloud_platform_types"

- name: Retrieve or detect cloud infra
- name: Retrieve or detect cloud infrastructure type for OpenShift cluster {{ current_cp4d_cluster.openshift_cluster_name }}
include_role:
name: retrieve-cloud-infra-type
vars:
Expand All @@ -19,5 +19,5 @@
- cloud_platform == 'existing-ocp'
- _storage_type == 'pwx'

- name: Prepare cluster-wide configuration for Cloud Pak for Data
- name: Prepare Cloud Pak for Data cluster-wide configuration on cluster {{ current_cp4d_cluster.openshift_cluster_name }}
include_tasks: cp4d-prepare-openshift.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@
foundational_services_project: "{{ _p_current_cp4d_cluster.operators_project | default('cpd-operators') }}"
when: _p_current_cp4d_cluster.cp4d_version >= "4.7.0"

# Set the license server project to the correct value, dependent if the license service is already installed in cs-control
- include_tasks: set-license-service-project.yml

- debug:
var: implemented_cloud_platform_types

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
- name: Check if the license service runs in the {{ cs_control_project }} project
shell: |
oc get deploy ibm-licensing-service-instance \
-n {{ cs_control_project }}
failed_when: False
register: _get_license_service_instance

- name: Set the license server project to {{ cs_control_project }} if it already runs there
set_fact:
license_service_project: "{{ cs_control_project }}"
when: _get_license_service_instance.rc == 0
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ cp4d_login_username: admin
cp4d_repo_url: cp.icr.io/cp/cpd
cp4d_repo_username: cp

license_service_project: cs-control
license_service_project: ibm-licensing
scheduling_service_project: cpd-scheduler
cert_manager_project: ibm-cert-manager
cs_control_project: cs-control
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,26 @@
that:
- _p_cp4d_version is defined

- name: Generate apply-cluster-components script {{ status_dir }}/cp4d/{{ current_cp4d_cluster.project }}-apply-cluster-components.sh
- name: Delete offline directory
file:
path: /tmp/work/offline
state: absent

- name: If air-gapped, copy case files from {{ status_dir }}/cp4d/offline to /tmp/work/offline
copy:
src: "{{ status_dir }}/cp4d/offline"
dest: /tmp/work/
remote_src: True
when: (cpd_airgap | default(False) | bool)

- name: Generate apply-cluster-components script {{ status_dir }}/cp4d/{{ current_cp4d_cluster.openshift_cluster_name }}-{{ current_cp4d_cluster.project }}-apply-cluster-components.sh
template:
src: apply-cluster-components.j2
dest: "{{ status_dir }}/cp4d/{{ current_cp4d_cluster.project }}-apply-cluster-components.sh"
dest: "{{ status_dir }}/cp4d/{{ current_cp4d_cluster.openshift_cluster_name }}-{{ current_cp4d_cluster.project }}-apply-cluster-components.sh"
mode: u+rwx

- name: Apply cluster components if not already done in an earlier step
block:

- name: Run shell script to apply cluster components, logs are in {{ status_dir }}/log/{{ current_cp4d_cluster.project }}-apply-cluster-components.log
shell: |
{{ status_dir }}/cp4d/{{ current_cp4d_cluster.project }}-apply-cluster-components.sh

- set_fact:
_cp_fs_cluster_compontents_run: True

- name: Run shell script to apply cluster components on OpenShift cluster {{ current_cp4d_cluster.openshift_cluster_name }}, logs are in {{ status_dir }}/log/{{ current_cp4d_cluster.openshift_cluster_name }}-{{ current_cp4d_cluster.project }}-apply-cluster-components.log
shell: |
{{ status_dir }}/cp4d/{{ current_cp4d_cluster.openshift_cluster_name }}-{{ current_cp4d_cluster.project }}-apply-cluster-components.sh
when:
- not _p_preview
- not (_cp_fs_cluster_compontents_run | default(False))
- not _p_preview
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ apply-cluster-components \
--migrate_from_cs_ns={{ foundational_services_project }} \
{% endif -%}
--cert_manager_ns={{ cert_manager_project }} \
--licensing_ns={{ cs_control_project }} 2>&1 | tee {{ status_dir }}/log/{{ current_cp4d_cluster.project }}-apply-cluster-components.log 2>&1
--licensing_ns={{ license_service_project }} 2>&1 | tee {{ status_dir }}/log/{{ current_cp4d_cluster.openshift_cluster_name }}-{{ current_cp4d_cluster.project }}-apply-cluster-components.log 2>&1
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,17 @@
oc set data -n kube-system cm/cloud-pak-node-fix-scripts \
--from-file={{ status_dir }}/openshift/cloud-pak-node-fix-timer.sh

- name: Check if custom script {{ config_dir }}/assets/apply-custom-node-settings.sh exists
stat:
path: "{{ config_dir }}/assets/apply-custom-node-settings.sh"
register: _custom_node_settings_script

- name: Put apply-custom-node-settings.sh script into config map
shell:
oc set data -n kube-system cm/cloud-pak-node-fix-scripts \
--from-file={{ config_dir }}/assets/apply-custom-node-settings.sh
when: _custom_node_settings_script.stat.exists

- name: Generate ServiceAccount for DaemonSet
template:
src: cloud-pak-node-fix-sa.j2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,12 @@ if [ -e ${NODE_FIX_DIR}/cp4d-apply-crio-config.sh ];then
fi
fi

if [ -e ${NODE_FIX_DIR}/apply-custom-node-settings.sh ];then
echo "Calling custom node update script ${NODE_FIX_DIR}/apply-custom-node-settings.sh" >> /tmp/cloud-pak-node-fix.log
source ${NODE_FIX_DIR}/apply-custom-node-settings.sh >> /tmp/cloud-pak-node-fix.log
echo "Custom script updated node settings (0=no, 1=yes): $NODE_UDPATED" >> /tmp/cloud-pak-node-fix.log
fi

if [ $NODE_UPDATED -eq 1 ];then
echo "Restarting kubelet and crio daemons" >> /tmp/cloud-pak-node-fix.log
systemctl restart kubelet
Expand Down
3 changes: 3 additions & 0 deletions docs/src/30-reference/configuration/cp4d-assets.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

The Cloud Pak Deployer can implement demo assets and accelerators as part of the deployment process to standardize standing up fully-featured demo environments, or to test patches or new versions of the Cloud Pak using pre-defined assets.

## Node changes for ROKS and Satellite clusters
If you put a script named `apply-custom-node-settings.sh` in the `CONFIG_DIR/assets` directory, it will be run as part of applying the node settings. This way you can override the existing node settings applied by the deployer or update the compute nodes with new settings. For more information regarding the `apply-custom-node-settings.sh` script, go to [Prepare OpenShift cluster on IBM Cloud and IBM Cloud Satellite](../process/install-cloud-pak.md#prepare-openshift-cluster-on-ibm-cloud-and-ibm-cloud-satellite).

## `cp4d_asset`
A `cp4d_asset` entry defines one or more assets to be deployed for a specific Cloud Pak for Data instance (OpenShift project). In the configuration, a directory relative to the configuration directory (`CONFIG_DIR`) is specified. For example, if the directory where the configuration is stored is `$HOME/cpd-config/sample` and you specify `assets` as the asset directory, all assets under `/cpd-config/sample/assets` are processed.

Expand Down
28 changes: 26 additions & 2 deletions docs/src/30-reference/process/install-cloud-pak.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ If an image registry has been specified for the Cloud Pak using the `image_regis

## Install Cloud Pak for Data and cartridges

### Prepare OpenShift cluster for Cloud Pak for Data installation
### Prepare OpenShift cluster for Cloud Pak installation
Cloud Pak for Data requires a number of cluster-wide settings:

* Create an `ImageContentSourcePolicy` if images must be pulled from a private registry
Expand All @@ -46,7 +46,7 @@ For all OpenShift clusters, except ROKS on IBM Cloud, these settings are applied

To avoid having to reload the nodes more than once, the Machine Config Operator is paused before the settings are applied. After all setup, the Machine Config Operator is released and the deployment process will then wait until all nodes are ready with the configuration applied.

#### Prepare OpenShift cluster on IBM Cloud
#### Prepare OpenShift cluster on IBM Cloud and IBM Cloud Satellite
As mentioned before, ROKS on IBM Cloud does not include the Machine Config Operator and would normally require the compute nodes to be reloaded (classic ROKS) or replaced (ROKS on VPC) to make the changes effective. While implementing this process, we have experienced intermittent reliability issues where replacement of nodes never finished or the cluster ended up in a unusable state. To avoid this, the process is applying the settings in a different manner.

On every node, a cron job is created which starts every 5 minutes. It runs a script that checks if any of the cluster-wide settings must be (re-)applied, then updates the local system and restarts the `crio` and `kubelet` daemons. If no settings are to be adjusted, the daemons will not be restarted and therefore the cron job has minimal or no effect on the running applications.
Expand All @@ -57,6 +57,30 @@ Compute node changes that are made by the cron job:
**CRI-O**: `pids_limit` and `default_ulimit` changes are made to the `/etc/crio/crio.conf` file.
**Pull secret**: The registry and credentials are appended to the `/.docker/config.json` configuration.

There are scenarios, especially on IBM Cloud Satellite, where custom changes must be applied to the compute nodes. This is possible by adding the `apply-custom-node-settings.sh` to the `assets` directory within the `CONFIG_DIR` directory. Once Kubelet, CRI-O and other changes have been applied, this script (if existing) is run to apply any additional configuration changes to the compute node.

By setting the `NODE_UPDATED` script variable to `1` you can tell the deployer to restart the `crio` and `kubelet` daemons.

**WARNING:** You should never set the `NODE_UPDATED` script variable to `0` as this will cause previous changes to the pull secret, ImageContentSourcePolicy and others not to become effective.

**WARNING:** Do not end the script with the `exit` command; this will stop the calling script from running and therefore not restart the daemons.

Sample script:
```bash
#!/bin/bash

#
# This is a sample script that will cause the crio and kubelet daemons to be restarted once by checking
# file /tmp/apply-custom-node-settings-run. If the file doesn't exist, it creates it and sets NODE_UPDATED to 1.
# The deployer will observe that the node has been updated and restart the daemons.
#

if [ ! -e /tmp/apply-custom-node-settings-run ];then
touch /tmp/apply-custom-node-settings-run
NODE_UPDATED=1
fi
```

### Mirror images to the private registry
If a private image registry is specified, and if the IBM Cloud Pak entitlement key is available in the vault (`cp_entitlement_key` secret), the Cloud Pak case files for the Foundational Services, the Cloud Pak control plane and cartridges are downloaded to a subdirectory of the status directory that was specified. Then all images defined for the cartridges are mirrored from the entitled registry to the private image registry. Dependent on network speed and how many cartridges have been configured, the mirroring can take a very long time (12+ hours). All images which have already been mirrored to the private registry are skipped by the mirroring process.

Expand Down
Loading