diff --git a/networking-workshop/README.md b/networking-workshop/README.md new file mode 100644 index 00000000..61eac590 --- /dev/null +++ b/networking-workshop/README.md @@ -0,0 +1,125 @@ +# Workshops + + +Open source series of workshops delivered by the Gravitational team. + +* [Docker 101 workshop](docker.md) +* [Kubernetes 101 workshop using Minikube and Mattermost](k8s101.md) +* [Kubernetes production patterns](k8sprod.md) +* [Kubernetes security patterns](k8ssecurity.md) +* [Kubernetes custom resources](crd/crd.md) +* [Gravity fire drill exercises](firedrills.md) +* [Gravity logging (Gravity 5.5 and earlier)](logging-5.x.md) +* [Gravity logging (Gravity 6.0 and later)](logging-6.x.md) +* [Gravity monitoring & alerts (Gravity 5.5 and earlier)](monitoring-5.x.md) +* [Gravity monitoring & alerts (Gravity 6.0 and later)](monitoring-6.x.md) +* [Gravity networking and network troubleshooting](gravity_networking.md) +* [Gravity upgrade (5.5)](upgrade-5.x.md) +* [Gravity upgrade (7.0)](gravity_upgrade.md) + +## Installation + +### Requirements + +You will need a Linux or macOS box with at least `7GB` of RAM and `20GB` of free disk space available. + +### Docker + +For Linux: follow instructions provided [here](https://docs.docker.com/engine/installation/linux/). + +If you have macOS (Yosemite or newer), please download Docker for Mac [here](https://download.docker.com/mac/stable/Docker.dmg). + +*Older docker package for OSes older than Yosemite -- Docker Toolbox located [here](https://www.docker.com/products/docker-toolbox).* + +### Hypervisor + +#### HyperKit [macOS only] + +HyperKit is a lightweight macOS hypervisor which minikube supports out of the box and which should be +already installed on your machine if you have Docker for Desktop installed. + +More information: . + +Alternatively, install VirtualBox like described below. + +#### KVM2 [Linux only] + +Follow the instructions here: . + +Alternatively, install VirtualBox like described below. + +#### VirtualBox [both macOS and Linux] + +Let’s install VirtualBox. + +Get latest stable version from . + +**Note:** When using Ubuntu you may need to disable Secure Boot. For an alternative approach to installing with Secure Boot enabled, +follow the guide [here](https://torstenwalter.de/virtualbox/ubuntu/2019/06/13/install-virtualbox-ubuntu-secure-boot.html). + +### Kubectl + +For macOS: + + curl -O https://storage.googleapis.com/kubernetes-release/release/v1.16.2/bin/darwin/amd64/kubectl \ + && chmod +x kubectl && sudo mv kubectl /usr/local/bin/ + +For Linux: + + curl -O https://storage.googleapis.com/kubernetes-release/release/v1.16.2/bin/linux/amd64/kubectl \ + && chmod +x kubectl && sudo mv kubectl /usr/local/bin/ + +### Minikube + +For macOS: + + curl -Lo minikube https://storage.googleapis.com/minikube/releases/v1.5.1/minikube-darwin-amd64 \ + && chmod +x minikube && sudo mv minikube /usr/local/bin/ + +For Linux: + + curl -Lo minikube https://storage.googleapis.com/minikube/releases/v1.5.1/minikube-linux-amd64 \ + && chmod +x minikube && sudo mv minikube /usr/local/bin/ + +Also, you can install drivers for various VM providers to optimize your minikube VM performance. +Instructions can be found here: . + +### Xcode and local tools + +Xcode will install essential console utilities for us. You can install it from the App Store. + +## Set up cluster using minikube + +To run cluster: + +**macOS** + +```bash +# starts minikube +$ minikube start --kubernetes-version=v1.16.2 +# this command should work +$ kubectl get nodes +# use docker from minikube +$ eval $(minikube docker-env) +# this command to check connectivity +$ docker ps +``` + +**Linux** + +```bash +# starts minikube +$ minikube start --kubernetes-version=v1.16.2 --vm-driver=kvm2 +# this command should work +$ kubectl get nodes +# use docker from minikube +$ eval $(minikube docker-env) +# this command to check connectivity +$ docker ps +``` + +## Configure registry + +```shell +kubectl create -f registry.yaml +``` diff --git a/networking-workshop/conf.d/default.conf b/networking-workshop/conf.d/default.conf new file mode 100644 index 00000000..d579a271 --- /dev/null +++ b/networking-workshop/conf.d/default.conf @@ -0,0 +1,8 @@ +server { + listen 80; + server_name localhost; + + location / { + return 200 'hello, Kubernetes!'; + } +} diff --git a/networking-workshop/docker.md b/networking-workshop/docker.md new file mode 100644 index 00000000..1010f1a9 --- /dev/null +++ b/networking-workshop/docker.md @@ -0,0 +1,820 @@ +# Docker 101 + +Docker 101 workshop - introduction to Docker and basic concepts + +## Installation + +### Hardware Requirements + +You will need an MacOS or Linux based system with at least `8GB RAM` and `10GB of free disk space` available. + +While it is possible to use Docker on Windows 10 systems, for the sake of simplicity, in this workshop will focus on POSIX compatible systems that are officially supported by Docker, like MacOS and Linux. + +### Software Requirements + +The main software required to follow this workshop is *Docker* itself. + +In order to install it on *Linux*: follow instructions provided [here](https://docs.docker.com/engine/installation/linux/). + +If you have Mac OS X (Yosemite or newer), please download Docker for Mac [here](https://download.docker.com/mac/stable/Docker.dmg). + +*Older docker package for OSes older than Yosemite -- Docker Toolbox located [here](https://www.docker.com/products/docker-toolbox).* + +### Video version + +This workshop is also available as a video on YouTube at the following link: + +[Workshop video](https://youtu.be/h7T8Sh1QrJU) + +## Introduction + +### Hello, world + +Docker is as easy as Linux! To prove that let us write classic "Hello, World" in Docker: + +```bash +$ docker run busybox echo "hello world" +``` + +Docker containers are just as simple as Linux processes, but they also provide many more features that we are going to explore. + +Let's review the structure of the command we just used: + +```bash +docker # Docker client binary used to interact with Docker +run # Docker subcommand - runs a command in a container +busybox # container image used by the run command +echo "hello world" # actual command to run (and arguments) +``` + +*Container images* carry within themselves all the needed libraries, binaries and directories in order to be able to run. + +*TIP:* Container images could be abstracted as "the blueprint for an object", while containers themselves are the actualization of the object into a real instance/entity. + +Commands running in containers normally use anything but the kernel from the host operating system. They will execute instead binaries provided within the chosen container image (`busybox` in the example above). + +### Where is my container? + +Running containers can be listed using the command: +```bash +$ docker ps +``` + +Here's an example showing a possible output from the `ps` command: + +``` +$ docker ps +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +eea49c9314db library/python:3.3 "python -m http.serve" 3 seconds ago Up 2 seconds 0.0.0.0:5000->5000/tcp simple1 +``` + +The fields shown in the output can be summarized as: + +* Container ID - auto generated unique running id +* Container image - image name +* Command - Linux process running as the PID 1 in the container +* Names - user friendly name of the container + +After running the "hello world" example above though there will be no running container since the entire life cycle of the command (`echo "hello world"`) has already finished and thus the container stopped. + +Once the command running inside the container finishes its execution, the container will stop running but will still be available, even if it's not listed in `ps` output by default. + +To list all containers, including stopped ones, use: +```bash +docker ps -a +``` + +Stopped containers will remain available until cleaned. You can then removed stopped containers by using: +```bash +docker rm my_container_name_or_id +``` +The argument used for the `rm` command can be the container ID or the container name. + +If you prefer, it's possible to add the option `--rm` to the `run` subcommand so that the container will be cleaned automatically as soon as it stops its execution. + +### Adding environment variables + +Let's see what environment variables are used by default: + +``` +$ docker run --rm busybox env +PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin +HOSTNAME=0a0169cdec9a +HOME=/root +``` + +The environment variables passed to the container may be different on other systems and the hostname is randomized per container, unless specified differently. + +When needed we can extend the environment by passing variable flags as `docker run` arguments: + +```bash +$ docker run --rm -e HELLO=world busybox env +PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin +HOSTNAME=8ee8ba3443b6 +HELLO=world +HOME=/root +``` + +### Sneak peek into the container environment + +Let's now take a look at process tree running in the container: + +```bash +$ docker run --rm busybox ps uax +``` + +My terminal prints out something similar to: + +```bash +PID USER TIME COMMAND + 1 root 0:00 ps uax +``` + +*Oh my!* Am I running this command as root? Technically yes, although remember as we anticipated this is not the actual root of your host system but a very limited one running inside the container. We will get back to the topic of users and security a bit later. + +In fact, as you can see, the process runs in a very limited and isolated environment where it cannot see or access all the other processes running on your machine. + +### Adding host mounts + +The filesystem used inside running containers is also isolated and separated from the one in the host: + +```bash +$ docker run --rm busybox ls -l /home +total 0 +``` + +What if we want to expose one or more directories inside a container? To do so the option `-v/--volume` must be used as shown in the following example: + +``` +$ docker run --rm -v $(pwd):/home busybox ls -l /home +total 72 +-rw-rw-r-- 1 1000 1000 11315 Nov 23 19:42 LICENSE +-rw-rw-r-- 1 1000 1000 30605 Mar 22 23:19 README.md +drwxrwxr-x 2 1000 1000 4096 Nov 23 19:30 conf.d +-rw-rw-r-- 1 1000 1000 2922 Mar 23 03:44 docker.md +drwxrwxr-x 2 1000 1000 4096 Nov 23 19:35 img +drwxrwxr-x 4 1000 1000 4096 Nov 23 19:30 mattermost +-rw-rw-r-- 1 1000 1000 585 Nov 23 19:30 my-nginx-configmap.yaml +-rw-rw-r-- 1 1000 1000 401 Nov 23 19:30 my-nginx-new.yaml +-rw-rw-r-- 1 1000 1000 399 Nov 23 19:30 my-nginx-typo.yaml +``` + +In the example command the current directory, specified via `$(pwd)`, was "mounted" from the host system in the container so that it appeared to be "/home" inside the container! + +In this configuration all changes done in the specified directory will be immediately seen in the container's `/home` directory. + +### Network + +Networking in Docker containers is also isolated. Let's look at the interfaces inside a running container: + +```bash +$ docker run --rm busybox ifconfig +eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:02 + inet addr:172.17.0.2 Bcast:0.0.0.0 Mask:255.255.0.0 + inet6 addr: fe80::42:acff:fe11:2/64 Scope:Link + UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 + RX packets:1 errors:0 dropped:0 overruns:0 frame:0 + TX packets:1 errors:0 dropped:0 overruns:0 carrier:0 + collisions:0 txqueuelen:0 + RX bytes:90 (90.0 B) TX bytes:90 (90.0 B) + +lo Link encap:Local Loopback + inet addr:127.0.0.1 Mask:255.0.0.0 + inet6 addr: ::1/128 Scope:Host + UP LOOPBACK RUNNING MTU:65536 Metric:1 + RX packets:0 errors:0 dropped:0 overruns:0 frame:0 + TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 + collisions:0 txqueuelen:1 + RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) +``` + +#### Networking example + +In case you're not familiar with Python, one of the built-in modules offer simple HTTP server features and by default it will serve the current directory via HTTP on the port specified as the command argument (5000) in our case. + +The following command should work on any Linux or MacOS system that has Python installed, and will offer your current directory content via HTTP on port 5000: +```bash +$ python -m http.server 5000 +``` + +We'll now translate that command in a Docker container, so that you won't need Python installed on your system (cause it will be provided inside the container). +To forward port 5000 from the host system to port 5000 inside the container the `-p` flag should be added to the `run` command: + +```bash +$ docker run --rm -p 5000:5000 library/python:3 python -m http.server 5000 +``` + +This command remains alive and attached to the current session because the server will keep listening for requests. +Try reaching it from a different terminal via the following command: + +```bash +$ curl http://localhost:5000 + + + +.... +``` + +Press `Ctrl-C` in the terminal running the container to stop it. + +## A bit of background + +![docker-settings](img/containers.png) + +The basic idea behind containers is a set of Linux resources that run isolated from the rest of the host OS. + +[chart](https://www.lucidchart.com/documents/edit/d5226f07-00b1-4a7a-ba22-59e0c2ec0b77/0) + +Multiple Linux subsystems help to create the container foundations: + +### Namespaces + +Namespaces create isolated stacks of Linux primitives for a running process. + +* NET namespace creates a separate networking stack for the container, with its own routing tables and devices. +* PID namespace is used to assign isolated process IDs that are separate from host OS. This is important to avoid any information exposure from the host about processes. +* MNT namespace creates a scoped view of a filesystem using [VFS](http://www.tldp.org/LDP/khg/HyperNews/get/fs/vfstour.html). It allows a container to get its own "root" filesystem and map directories from one location on the host to the other location inside container. +* UTS namespace lets container to get to its own hostname. +* IPC namespace is used to isolate inter-process communication (e.g. IPC, pipes, message queues and so on). +* USER namespace allows container processes have different users and IDs from the host OS. + +### Control groups + +Control Groups (also called `cgroups`) are kernel feature that limits, accounts for, and isolates resources usage (CPU, memory, disk I/O, network, etc.) + +This feature is particularly useful to predict and plan for enough resources to accommodate the desired number of containers on your systems. + +### Capabilities + +Capabilities provide enhanced permission checks on the running process, and can limit the interface configuration, even for a root user. For example, if `CAP_NET_ADMIN` is disabled, users inside a container (including root) won't be able to manage network interfaces (add, delete, change), change network routes and so on. + +You can find a lot of additional low level detail [here](https://web.archive.org/web/20200221172516/http://crosbymichael.com:80/creating-containers-part-1.html) or see `man capabilities` for more info about this topic. + +## More container operations + +### Daemons + +Our last python server example was inconvenient as it worked in foreground so it was bound to our shell. If we closed our shell the container would also die with it. In order to fix this problem let's change our command to: + +```bash +$ docker run --rm -d -p 5000:5000 --name=simple1 library/python:3 python -m http.server 5000 +``` + +Flag `-d` instructs Docker to start the process in background. Let's see if our HTTP connection still works after we close our session: + +```bash +curl http://localhost:5000 + + + +... +``` + +It's still working and now we can see it running with the `ps` command: +```bash +docker ps +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +eea49c9314db library/python:3 "python -m http.serve" 3 seconds ago Up 2 seconds 0.0.0.0:5000->5000/tcp simple1 +``` + +### Inspecting a running container + +If we want more information about a running container we can check its logs output using the `logs` command: + +```bash +$ docker logs simple1 +``` + +Docker also offers the useful command `inspect` which retrieves all the info related to a specific object (network, container, image, ecc): + +```bash +docker inspect kind_bell +[ + { + "Id": "1da9cdd92fc3f69cf7cd03b2fa898c06fdcfb8f9913479d6fa15688a4984c877", + "Created": "2019-06-01T19:04:49.344803709Z", + "Path": "echo", + "Args": [ + "hello world" + ], + "State": { + "Status": "exited", +... +``` + +### Attaching to a running container** + +While a container is still running, we can enter its namespaces using the `exec` command: + +```bash +$ docker exec -ti simple1 /bin/sh +``` + +The command above will open an `sh` interactive shell that we can use to look around and play with, inside the container. + +One little note about the additional options specified in the `exec` command. + +* `-t` flag attaches terminal for interactive typing +* `-i` flag attaches input/output from the terminal to the process + +Now that we have opened a new shell inside the container, let's find what process is running as PID 1: + +This workflow is similar to using `SSH` to connect in the container, however there is no remote network connection involved. +The process `/bin/sh` shell session is started running in the container namespaces instead of the host OS ones. + +```bash +$ ps uax +USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND +root 1 0.5 0.0 74456 17512 ? Ss 18:07 0:00 python -m http.server 5000 +root 7 0.0 0.0 4336 748 ? Ss 18:08 0:00 /bin/sh +root 13 0.0 0.0 19188 2284 ? R+ 18:08 0:00 ps uax +``` + +### Attaching to containers input + +To best illustrate the impact of `-i` or `--interactive` in the expanded version, consider this example: + +```bash +$ echo "hello there" | docker run --rm busybox grep hello +``` + +The example above won't work as the container's input is not attached to the host stdout. The `-i` flag fixes just that: + +```bash +$ echo "hello there" | docker run --rm -i busybox grep hello +hello there +``` + +### Starting and stopping containers + +It is possible to stop and start long-living containers using `stop` and `start` commands: + +```bash +$ docker stop simple1 +$ docker start simple1 +``` + +**NOTE:** container names should be unique. Otherwise, you will get an error when you try to create a new container with a conflicting name! + +## Building Container images + +So far we have been using container images downloaded from Docker's public registry. + +One of the key success factors for Docker among competitors was the possibility to easily create, customize, share and improve container images cooperatively. + +Let's see how it works. + +### Starting from scratch + +`Dockerfile` is a special file that instructs `docker build` command how to build an image: + +```bash +$ cd docker/scratch +$ cat hello.sh +$ docker build -t hello . +Sending build context to Docker daemon 3.072 kB +Step 1 : FROM scratch + ---> +Step 2 : ADD hello.sh /hello.sh + ---> 4dce466cf3de +Removing intermediate container dc8a5b93d5a8 +Successfully built 4dce466cf3de +``` + +The Dockerfile used is very simple: + +```dockerfile +FROM scratch +ADD hello.sh /hello.sh +``` + +* `FROM scratch` instructs the Docker build process to use an empty image as the basis to build our custom container image +* `ADD hello.sh /hello.sh` adds the file `hello.sh` to the container's root path `/hello.sh`. + +### Viewing images + +`docker images` command is used to display images that we have built: + +```bash +docker images +REPOSITORY TAG IMAGE ID CREATED SIZE +hello latest 4dce466cf3de 10 minutes ago 34 B +``` + +Here's a quick explanation of the columns shown in that output: + +* Repository - a name associated to this image locally (on your computer) or on a remote repository. Our current repository is local and the image is called `hello` +* Tag - indicates the version of our image, Docker sets `latest` tag automatically if none is specified +* Image ID - unique image ID +* Size - the size of our image is just 34 bytes + +**NOTE:** Docker images are quite different from virtual machine image formats. Since Docker does not boot any operating system, but simply runs Linux processes in isolation, we don't need any kernel or drivers to ship with the image, so it could be as tiny as just a few bytes! + +### Running the image + +Trying to run our newly built image will result in an error similar to one of the following, depending on the Docker version: + +```bash +$ docker run --rm hello /hello.sh +write pipe: bad file descriptor +``` + +or + +```bash +standard_init_linux.go:211: exec user process caused "no such file or directory" +``` + +This is because our container is empty. There is no shell and the script won't be able to start! +Let's fix that by changing our base image to `busybox` that contains a proper shell environment: + +```bash +$ cd docker/busybox +$ docker build -t hello . +Sending build context to Docker daemon 3.072 kB +Step 1 : FROM busybox + ---> 00f017a8c2a6 +Step 2 : ADD hello.sh /hello.sh + ---> c8c3f1ea6ede +Removing intermediate container fa59f3921ff8 +Successfully built c8c3f1ea6ede +``` + +Listing the image shows that image ID and size have changed: + +```bash +$ docker images +REPOSITORY TAG IMAGE ID CREATED SIZE +hello latest c8c3f1ea6ede 10 minutes ago 1.11 MB +``` + +We can run our script now: + +```bash +$ docker run --rm hello /hello.sh +hello, world! +``` + +### Versioning + +Let us roll a new version of our script `v2` + +```bash +$ cd docker/busybox-v2 +$ cat Dockerfile +FROM busybox +ADD hello.sh /hello.sh +$ docker build -t hello:v2 . +``` + +We will now see 2 images `hello:v2` and `hello:latest`: + +```bash +$ docker images +hello v2 195aa31a5e4d 2 seconds ago 1.11 MB +hello latest 47060b048841 20 minutes ago 1.11 MB +``` + +**NOTE:** Tag `latest` will not automatically point to the latest version, so you have to manually update it + +Execute the script using `image:tag` notation: + +```bash +$ docker run --rm hello:v2 /hello.sh +hello, world v2! +``` + +### Entry point + +We can improve our image by supplying `entrypoint`, which sets the default command executed if none is specified when starting the container: + +```bash +$ cd docker/busybox-entrypoint +$ cat Dockerfile +FROM busybox +ADD hello.sh /hello.sh +ENTRYPOINT ["/hello.sh"] +$ docker build -t hello:v3 . +``` + +We should now be able to run the new image version without supplying additional arguments: + +```bash +$ docker run --rm hello:v3 +hello, world ! +``` + +What happens if you pass an additional argument as in previous examples? They will be passed to the `ENTRYPOINT` command as arguments: + +```bash +$ docker run --rm hello:v3 woo +hello, world woo! +``` + +Arguments are then appended to the output because our v3 `hello.sh` is set to do so via the use of the `$@` magic variable: + +```bash +#!/bin/sh + +echo "hello, world $@!" +``` + +### Environment variables + +We can pass environment variables during build and during runtime as well. + +Here's our modified `hello.sh` shellscript: + +```bash +$ cd docker/busybox-env +$ cat hello.sh +#!/bin/sh + +echo "hello, $BUILD1 and $RUN1!" +``` + +Dockerfile now uses `ENV` directive to provide environment variable: + +```Dockerfile +FROM busybox +ADD hello.sh /hello.sh +ENV BUILD1 Bob +ENTRYPOINT ["/hello.sh"] +``` + +Let's build and run: + +```bash +cd docker/busybox-env +$ docker build -t hello:v4 . +$ docker run --rm -e RUN1=Alice hello:v4 +hello, Bob and Alice! +``` + +Though it's important to know that **variables specified at runtime takes precedence over those specified at build time**: +```bash +$ docker run --rm -e BUILD1=Jon -e RUN1=Alice hello:v4 +hello, Jon and Alice! +``` + +### Build arguments + +Sometimes it is helpful to supply arguments during build process +(for example, user ID to be created inside the container). +We can supply build arguments as flags to `docker build` as we already did to the `run` command: + +```bash +$ cd docker/busybox-arg +$ docker build --build-arg=ARG1="Alice and Bob" -t hello:v5 . +$ docker run hello:v5 +hello, Alice and Bob! +``` + +Here is our updated Dockerfile: + +```Dockerfile +FROM busybox +ADD hello.sh /hello.sh +ARG BUILD1 +ENV BUILD1 $BUILD1 +ENTRYPOINT ["/hello.sh"] +``` + +Notice how `ARG` have supplied the build argument and we have referred to it right away in the Dockerfile itself, and also exposing it as environment variable afterward. + +### Build layers and caching + +Let's take a look at the new build image in the `docker/cache` directory: + +```bash +$ ls -l docker/cache/ +total 12 +-rw-rw-r-- 1 sasha sasha 76 Mar 24 16:23 Dockerfile +-rw-rw-r-- 1 sasha sasha 6 Mar 24 16:23 file +-rwxrwxr-x 1 sasha sasha 40 Mar 24 16:23 script.sh +``` + +We have a file and a script that uses the file: + +```bash +$ cd docker/cache +$ docker build -t hello:v6 . + +Sending build context to Docker daemon 4.096 kB +Step 1 : FROM busybox + ---> 00f017a8c2a6 +Step 2 : ADD file /file + ---> Using cache + ---> 6f48df47cb1d +Step 3 : ADD script.sh /script.sh + ---> b052fd11bcc6 +Removing intermediate container c555e8ab29dc +Step 4 : ENTRYPOINT /script.sh + ---> Running in 50f057fd89cb + ---> db7c6f36cba1 +Removing intermediate container 50f057fd89cb +Successfully built db7c6f36cba1 + +$ docker run --rm hello:v6 +hello, hello! +``` + +Let's update the script.sh + +```bash +cp script2.sh script.sh +``` + +They are only different by one letter, but this makes a difference: + + +```bash +$ docker build -t hello:v7 . +$ docker run --rm hello:v7 +Hello, hello! +``` + +Notice `Using cache` diagnostic output from the container: + +``` +$ docker build -t hello:v7 . +Sending build context to Docker daemon 5.12 kB +Step 1 : FROM busybox + ---> 00f017a8c2a6 +Step 2 : ADD file /file + ---> Using cache + ---> 6f48df47cb1d +Step 3 : ADD script.sh /script.sh + ---> b187172076e2 +Removing intermediate container 7afa2631d677 +Step 4 : ENTRYPOINT /script.sh + ---> Running in 51217447e66c + ---> d0ec3cfed6f7 +Removing intermediate container 51217447e66c +Successfully built d0ec3cfed6f7 +``` + +Docker executes every command in a special container. It detects the fact that the content has (or has not) changed, and instead of re-executing the command, uses cached value instead. This helps to speed up builds, but sometimes introduces problems. + +**NOTE:** You can always turn caching off by using the `--no-cache=true` option for the `docker build` command. + +Docker images are composed of layers: + +![images](https://docs.docker.com/storage/storagedriver/images/container-layers.jpg) + +Every layer is a the result of the execution of a command in the Dockerfile. + +### RUN command + +The most frequently used command is `RUN` as it executes the command in a container, captures the output and records it as an image layer. + +Let's us use existing package managers to compose our images: + +```Dockerfile +FROM ubuntu:18.04 +RUN apt-get update +RUN apt-get install -y curl +ENTRYPOINT curl +``` + +Since this example is based on the `ubuntu` Docker image, the output of this build will look more like a standard Linux install: + +```bash +$ cd docker/ubuntu +$ docker build -t myubuntu . +``` + +We can use our newly created ubuntu to curl pages: + +```bash +$ # don't use `--rm` this time +$ docker run myubuntu https://google.com + % Total % Received % Xferd Average Speed Time Time Time Current + Dload Upload Total Spent Left Speed +100 220 100 220 0 0 1377 0 --:--:-- --:--:-- --:--:-- 1383 + +301 Moved +

301 Moved

+The document has moved +here. + +``` + +However, it all comes at a price: + +```bash +$ docker images +REPOSITORY TAG IMAGE ID CREATED SIZE +myubuntu latest 50928f386c70 53 seconds ago 106 MB +``` + +That is 106MB for curl! As we know, there is no mandatory requirement to have images with all the OS inside. +If base on your use-case you still need it though, Docker will save you some space by re-using the base layer, so images with slightly different bases would not repeat each other. + +### Operations with images + +You are already familiar with one command, `docker images`. You can also remove images, tag and untag them. + +#### Removing images and containers + +Let's start with removing the image that takes too much disk space: + +```bash +$ docker rmi myubuntu +Error response from daemon: conflict: unable to remove repository reference "myubuntu" (must force) - container 292d1e8d5103 is using its referenced image 50928f386c70 +``` + +Docker complains that there are containers using this image. How is this possible? As mentioned previously docker keeps track of all containers, even those that have stopped and won't allow deleting images used by existing containers, running or not: + +```bash +$ docker ps -a +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +292d1e8d5103 myubuntu "curl https://google." 5 minutes ago Exited (0) 5 minutes ago cranky_lalande +f79c361a24f9 440a0da6d69e "/bin/sh -c curl" 5 minutes ago Exited (2) 5 minutes ago nauseous_sinoussi +01825fd28a50 440a0da6d69e "/bin/sh -c curl --he" 6 minutes ago Exited (2) 5 minutes ago high_davinci +95ffb2131c89 440a0da6d69e "/bin/sh -c curl http" 6 minutes ago Exited (2) 6 minutes ago lonely_sinoussi +``` + +We can now delete the container: + +```bash +$ docker rm 292d1e8d5103 +292d1e8d5103 +``` + +and the image: + +```bash +$ docker rmi myubuntu +Untagged: myubuntu:latest +Deleted: sha256:50928f386c704610fb16d3ca971904f3150f3702db962a4770958b8bedd9759b +``` + +### Tagging images + +`docker tag` helps us to tag images. + +We have quite a lot of versions of `hello` built, but latest still points to the old `v1`. + +```bash +$ docker images | grep hello +hello v7 d0ec3cfed6f7 33 minutes ago 1.11 MB +hello v6 db7c6f36cba1 42 minutes ago 1.11 MB +hello v5 1fbecb029c8e About an hour ago 1.11 MB +hello v4 ddb5bc88ebf9 About an hour ago 1.11 MB +hello v3 eb07be15b16a About an hour ago 1.11 MB +hello v2 195aa31a5e4d 3 hours ago 1.11 MB +hello latest 47060b048841 3 hours ago 1.11 MB +``` + +Let's change that by re-tagging `latest` to `v7`: + +```bash +$ docker tag hello:v7 hello:latest +$ docker images | grep hello +hello latest d0ec3cfed6f7 38 minutes ago 1.11 MB +hello v7 d0ec3cfed6f7 38 minutes ago 1.11 MB +hello v6 db7c6f36cba1 47 minutes ago 1.11 MB +hello v5 1fbecb029c8e About an hour ago 1.11 MB +hello v4 ddb5bc88ebf9 About an hour ago 1.11 MB +hello v3 eb07be15b16a About an hour ago 1.11 MB +hello v2 195aa31a5e4d 3 hours ago 1.11 MB +``` + +Both `v7` and `latest` point to the same image ID `d0ec3cfed6f7`. + +### Publishing images + +Images are distributed with a special service - `docker registry`. +Let us spin up a local registry: + +```bash +$ docker run --rm -p 5000:5000 --name registry -d registry:2 +``` + +`docker push` is used to publish images to registries. + +To instruct where we want to publish, we need to prepend registry address to image name: + +```bash +$ docker tag hello:v7 127.0.0.1:5000/hello:v7 +$ docker push 127.0.0.1:5000/hello:v7 +``` + +`docker push` pushed the image to our "remote" registry. + +We can now download the image using the `docker pull` command: + +```bash +$ docker pull 127.0.0.1:5000/hello:v7 +v7: Pulling from hello +Digest: sha256:c472a7ec8ab2b0db8d0839043b24dbda75ca6fa8816cfb6a58e7aaf3714a1423 +Status: Image is up to date for 127.0.0.1:5000/hello:v7 +``` + +### Wrapping up + +We have learned how to start, build and publish containers and learned the containers building blocks. +However, there is much more to learn. Just check out the [official docker documentation!](https://docs.docker.com/engine/userguide/). + +Thanks to the Docker team for such an amazing product! diff --git a/networking-workshop/docker/busybox-arg/Dockerfile b/networking-workshop/docker/busybox-arg/Dockerfile new file mode 100644 index 00000000..41d61094 --- /dev/null +++ b/networking-workshop/docker/busybox-arg/Dockerfile @@ -0,0 +1,5 @@ +FROM busybox +ADD hello.sh /hello.sh +ARG ARG1 +ENV BUILD1 $ARG1 +ENTRYPOINT ["/hello.sh"] diff --git a/networking-workshop/docker/busybox-arg/hello.sh b/networking-workshop/docker/busybox-arg/hello.sh new file mode 100755 index 00000000..8e4b9e88 --- /dev/null +++ b/networking-workshop/docker/busybox-arg/hello.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +echo "hello, $BUILD1!" + diff --git a/networking-workshop/docker/busybox-entrypoint/Dockerfile b/networking-workshop/docker/busybox-entrypoint/Dockerfile new file mode 100644 index 00000000..cf430076 --- /dev/null +++ b/networking-workshop/docker/busybox-entrypoint/Dockerfile @@ -0,0 +1,3 @@ +FROM busybox +ADD hello.sh /hello.sh +ENTRYPOINT ["/hello.sh"] \ No newline at end of file diff --git a/networking-workshop/docker/busybox-entrypoint/hello.sh b/networking-workshop/docker/busybox-entrypoint/hello.sh new file mode 100755 index 00000000..544f7926 --- /dev/null +++ b/networking-workshop/docker/busybox-entrypoint/hello.sh @@ -0,0 +1,3 @@ +#!/bin/sh + +echo "hello, world $@!" diff --git a/networking-workshop/docker/busybox-env/Dockerfile b/networking-workshop/docker/busybox-env/Dockerfile new file mode 100644 index 00000000..27dcc967 --- /dev/null +++ b/networking-workshop/docker/busybox-env/Dockerfile @@ -0,0 +1,4 @@ +FROM busybox +ADD hello.sh /hello.sh +ENV BUILD1 Bob +ENTRYPOINT ["/hello.sh"] \ No newline at end of file diff --git a/networking-workshop/docker/busybox-env/hello.sh b/networking-workshop/docker/busybox-env/hello.sh new file mode 100755 index 00000000..e717313b --- /dev/null +++ b/networking-workshop/docker/busybox-env/hello.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +echo "hello, $BUILD1 and $RUN1!" + diff --git a/networking-workshop/docker/busybox-v2/Dockerfile b/networking-workshop/docker/busybox-v2/Dockerfile new file mode 100644 index 00000000..9f759ea5 --- /dev/null +++ b/networking-workshop/docker/busybox-v2/Dockerfile @@ -0,0 +1,2 @@ +FROM busybox +ADD hello.sh /hello.sh \ No newline at end of file diff --git a/networking-workshop/docker/busybox-v2/hello.sh b/networking-workshop/docker/busybox-v2/hello.sh new file mode 100755 index 00000000..3c2db464 --- /dev/null +++ b/networking-workshop/docker/busybox-v2/hello.sh @@ -0,0 +1,3 @@ +#!/bin/sh + +echo "hello, world v2!" diff --git a/networking-workshop/docker/busybox/Dockerfile b/networking-workshop/docker/busybox/Dockerfile new file mode 100644 index 00000000..9f759ea5 --- /dev/null +++ b/networking-workshop/docker/busybox/Dockerfile @@ -0,0 +1,2 @@ +FROM busybox +ADD hello.sh /hello.sh \ No newline at end of file diff --git a/networking-workshop/docker/busybox/hello.sh b/networking-workshop/docker/busybox/hello.sh new file mode 100755 index 00000000..8de4a4e6 --- /dev/null +++ b/networking-workshop/docker/busybox/hello.sh @@ -0,0 +1,3 @@ +#!/bin/sh + +echo "hello, world!" diff --git a/networking-workshop/docker/cache/Dockerfile b/networking-workshop/docker/cache/Dockerfile new file mode 100644 index 00000000..f7ae2e2a --- /dev/null +++ b/networking-workshop/docker/cache/Dockerfile @@ -0,0 +1,4 @@ +FROM busybox +ADD file /file +ADD script.sh /script.sh +ENTRYPOINT ["/script.sh"] diff --git a/networking-workshop/docker/cache/file b/networking-workshop/docker/cache/file new file mode 100644 index 00000000..ce013625 --- /dev/null +++ b/networking-workshop/docker/cache/file @@ -0,0 +1 @@ +hello diff --git a/networking-workshop/docker/cache/script.sh b/networking-workshop/docker/cache/script.sh new file mode 100755 index 00000000..0124d31a --- /dev/null +++ b/networking-workshop/docker/cache/script.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +echo "hello, $(cat /file)!" + diff --git a/networking-workshop/docker/cache/script2.sh b/networking-workshop/docker/cache/script2.sh new file mode 100755 index 00000000..a0940e3e --- /dev/null +++ b/networking-workshop/docker/cache/script2.sh @@ -0,0 +1,3 @@ +#!/bin/sh + +echo "Hello, $(cat /file)!" diff --git a/networking-workshop/docker/scratch.dockerfile b/networking-workshop/docker/scratch.dockerfile new file mode 100644 index 00000000..13a29ec2 --- /dev/null +++ b/networking-workshop/docker/scratch.dockerfile @@ -0,0 +1,2 @@ +FROM scratch +ADD \ No newline at end of file diff --git a/networking-workshop/docker/scratch/Dockerfile b/networking-workshop/docker/scratch/Dockerfile new file mode 100644 index 00000000..b8216ecb --- /dev/null +++ b/networking-workshop/docker/scratch/Dockerfile @@ -0,0 +1,2 @@ +FROM scratch +ADD hello.sh /hello.sh \ No newline at end of file diff --git a/networking-workshop/docker/scratch/hello.sh b/networking-workshop/docker/scratch/hello.sh new file mode 100755 index 00000000..8de4a4e6 --- /dev/null +++ b/networking-workshop/docker/scratch/hello.sh @@ -0,0 +1,3 @@ +#!/bin/sh + +echo "hello, world!" diff --git a/networking-workshop/docker/ubuntu/Dockerfile b/networking-workshop/docker/ubuntu/Dockerfile new file mode 100644 index 00000000..0b2d2229 --- /dev/null +++ b/networking-workshop/docker/ubuntu/Dockerfile @@ -0,0 +1,4 @@ +FROM ubuntu:18.04 +RUN apt-get update +RUN apt-get install -y curl +ENTRYPOINT ["curl"] diff --git a/networking-workshop/env/.gitignore b/networking-workshop/env/.gitignore new file mode 100644 index 00000000..e2bf8bc2 --- /dev/null +++ b/networking-workshop/env/.gitignore @@ -0,0 +1,2 @@ +*.tfstate +*.tfstate.backup diff --git a/networking-workshop/env/Makefile b/networking-workshop/env/Makefile new file mode 100644 index 00000000..3e3a4de0 --- /dev/null +++ b/networking-workshop/env/Makefile @@ -0,0 +1,95 @@ +# env parameters +ENV ?= +REGION ?= "us-central1" +ZONE ?= "us-central1-a" +BUILDER_INSTANCE ?= n1-standard-2 +CLUSTER_INSTANCE ?= n1-standard-4 +NODES ?= 3 +CREDS ?= + +# binaries to use +TELE ?= tele +TF ?= terraform + +# purpose label to tag all GCE resources with, will be used for billing +PURPOSE ?= training + +# directory with terraform scripts +TF_DIR ?= terraform + +# path to the public SSH key file to put on GCE instances +SSH_KEY_PATH ?= + +# exported terraform variables +TF_VAR_node_tag := $(ENV) +TF_VAR_purpose := $(PURPOSE) +TF_VAR_ssh_key_path := $(SSH_KEY_PATH) +TF_VAR_region := $(REGION) +TF_VAR_zone := $(ZONE) +TF_VAR_credentials := $(CREDS) +TF_VAR_builder_instance_type := $(BUILDER_INSTANCE) +TF_VAR_cluster_instance_type := $(CLUSTER_INSTANCE) +TF_VAR_nodes := $(NODES) + +export + +# +# up sets up training environment with specified name. +# +.PHONY: up +up: check-env check-ssh-key init + cd ${TF_DIR} && ${TF} apply -auto-approve + +# +# down tears down training environment with specified name. +# +.PHONY: down +down: check-env refresh + cd ${TF_DIR} && ${TF} destroy -auto-approve + +# +# out displays output variables for environment with specified name. +# +.PHONY: out +out: check-env refresh + +# +# csv outputs IPs of nodes of the specified environment as comma-separated values. +# +.PHONY: csv +csv: check-env + @cd ${TF_DIR} && ${TF} output csv + +# +# refresh refreshes local terraform state from S3 bucket. +# +.PHONY: refresh +refresh: check-env init + cd ${TF_DIR} && ${TF} refresh + +# +# init initializes a single training environment. +# +.PHONY: init +init: check-env + cd ${TF_DIR} && ${TF} init -reconfigure \ + -backend-config="path=../$(ENV).tfstate" + + +# +# check-env makes sure ENV environment variable is set. +# +.PHONY: check-env +check-env: + @if [ -z "$(ENV)" ]; then \ + echo "ENV is not set"; exit 1; \ + fi; + +# +# check-ssh-key makes sure SSH_KEY_PATH environment variable is set. +# +.PHONY: check-ssh-key +check-ssh-key: + @if [ -z "$(SSH_KEY_PATH)" ]; then \ + echo "SSH_KEY_PATH is not set"; exit 1; \ + fi; diff --git a/networking-workshop/env/README.md b/networking-workshop/env/README.md new file mode 100644 index 00000000..414ab5c9 --- /dev/null +++ b/networking-workshop/env/README.md @@ -0,0 +1,27 @@ +This directory contains a set of scripts for provisioning training environments +for Gravity workshops. + +Each environment consists of 3 clean Ubuntu nodes suitable for installing +Gravity cluster. The nodes are provisioned on GCE using terraform >= v0.12. + +### Usage Examples + +#### Provision Environment + +```bash +$ make up ENV=training01 SSH_KEY_PATH=... +$ make up ENV=training02 SSH_KEY_PATH=... REGION=us-east1 ZONE=us-east1-b +``` + +#### View Environment Information + +```bash +$ make out ENV=training01 SSH_KEY_PATH=... +$ make csv ENV=training01 SSH_KEY_PATH=... +``` + +#### Destroy Environment + +```bash +$ make down ENV=training01 SSH_KEY_PATH=... +``` diff --git a/networking-workshop/env/terraform/.gitignore b/networking-workshop/env/terraform/.gitignore new file mode 100644 index 00000000..3fa8c86b --- /dev/null +++ b/networking-workshop/env/terraform/.gitignore @@ -0,0 +1 @@ +.terraform diff --git a/networking-workshop/env/terraform/bootstrap-cluster-node.sh.tpl b/networking-workshop/env/terraform/bootstrap-cluster-node.sh.tpl new file mode 100644 index 00000000..6919be27 --- /dev/null +++ b/networking-workshop/env/terraform/bootstrap-cluster-node.sh.tpl @@ -0,0 +1,67 @@ +#!/bin/bash +set -exuo pipefail + +# Setup friendly hostname. +hostname ${hostname} +echo ${hostname} > /etc/hostname +echo "127.0.0.1 ${hostname}" >> /etc/hosts + +cat > /etc/apt/sources.list <> /etc/fstab +mount /var/lib/gravity +mkdir -p /var/lib/gravity/planet/etcd +echo -e "/dev/$etcd_disk\t/var/lib/gravity/planet/etcd\text4\tdefaults\t0\t2" >> /etc/fstab +mount /var/lib/gravity/planet/etcd + +# Load required kernel modules +for module in $modules; do + modprobe $module || true +done + +# Make changes permanent +cat > /etc/sysctl.d/50-telekube.conf < /etc/modules-load.d/telekube.conf +for module in $modules; do + echo $module >> /etc/modules-load.d/telekube.conf +done +sysctl -p /etc/sysctl.d/50-telekube.conf + +real_user=${ssh_user} +service_uid=$(id $real_user -u) +service_gid=$(id $real_user -g) +chown -R $service_uid:$service_gid /var/lib/gravity /var/lib/gravity/planet/etcd + +# Clone workshop repo. +workshop_path=/home/${ssh_user}/workshop +git clone https://github.com/gravitational/workshop.git $workshop_path +chown -R $service_uid:$service_gid $workshop_path diff --git a/networking-workshop/env/terraform/cluster.tf b/networking-workshop/env/terraform/cluster.tf new file mode 100644 index 00000000..82236727 --- /dev/null +++ b/networking-workshop/env/terraform/cluster.tf @@ -0,0 +1,105 @@ +resource "google_compute_instance_group" "training" { + description = "Instance group with all instances for the training" + name = "${var.node_tag}-grp" + zone = var.zone + network = data.google_compute_network.training.self_link + instances = google_compute_instance.cluster_node.*.self_link +} + +resource "google_compute_instance" "cluster_node" { + description = "Node that will be a part of the cluster" + count = var.nodes + name = "${var.node_tag}-cluster-node-${count.index + 1}" + machine_type = var.cluster_instance_type + zone = var.zone + + tags = [ + var.node_tag, + ] + + labels = { + cluster = var.node_tag + purpose = var.purpose + } + + network_interface { + network = data.google_compute_network.training.self_link + + access_config { + # Ephemeral IP + } + } + + metadata = { + # Enable OS login using IAM roles + enable-oslogin = "true" + # ssh-keys controls access to an instance using a custom SSH key + ssh-keys = "${var.os_user}:${file(var.ssh_key_path)}" + } + + metadata_startup_script = data.template_file.bootstrap_cluster_node[count.index].rendered + + boot_disk { + initialize_params { + image = var.vm_image + type = var.disk_type + size = 50 + } + + auto_delete = "true" + } + + attached_disk { + source = google_compute_disk.etcd[count.index].self_link + mode = "READ_WRITE" + } + + attached_disk { + source = google_compute_disk.system[count.index].self_link + mode = "READ_WRITE" + } + + service_account { + scopes = [ + "compute-rw", + "storage-ro", + ] + } +} + +resource "google_compute_disk" "etcd" { + count = var.nodes + name = "${var.node_tag}-disk-etcd-${count.index}" + type = var.disk_type + zone = var.zone + size = 10 + + labels = { + cluster = var.node_tag + purpose = var.purpose + } +} + +resource "google_compute_disk" "system" { + count = var.nodes + name = "${var.node_tag}-disk-system-${count.index}" + type = var.disk_type + zone = var.zone + size = 50 + + labels = { + cluster = var.node_tag + purpose = var.purpose + } +} + +data "template_file" "bootstrap_cluster_node" { + count = var.nodes + template = file("./bootstrap-cluster-node.sh.tpl") + + vars = { + ssh_user = var.os_user + hostname = "node-${count.index + 1}" + } +} + diff --git a/networking-workshop/env/terraform/config.tf b/networking-workshop/env/terraform/config.tf new file mode 100644 index 00000000..21d9068e --- /dev/null +++ b/networking-workshop/env/terraform/config.tf @@ -0,0 +1,68 @@ +variable "nodes" { + description = "Number of cluster nodes." + default = "3" +} + +variable "purpose" { + description = "Environment purpose, will be used for billing." + default = "training" +} + +variable "cluster_instance_type" { + description = "VM instance type to use for the cluster nodes." + default = "n1-standard-4" +} + +variable "vm_image" { + description = "VM image reference." + default = "ubuntu-os-cloud/ubuntu-1604-xenial-v20180405" +} + +variable "node_tag" { + description = "GCE-friendly cluster name to use as a prefix for resources." +} + +variable "disk_type" { + description = "Type of disk to provision." + default = "pd-ssd" +} + +variable "os_user" { + description = "Name of the SSH user." + default = "ubuntu" +} + +variable "ssh_key_path" { + description = "Path to the public SSH key." + default = "" +} + +variable "project" { + description = "Project to deploy to, if not set the default provider project is used." + default = "kubeadm-167321" +} + +variable "region" { + description = "Region for resources." + default = "us-central1" +} + +variable "zone" { + description = "Zone for resources." + default = "us-central1-a" +} + +variable "credentials" { + description = "Path to application access credentials file." + default = "" +} + +provider "google" { + project = var.project + region = var.region + credentials = file(var.credentials) +} + +terraform { + backend "local" {} +} diff --git a/networking-workshop/env/terraform/network.tf b/networking-workshop/env/terraform/network.tf new file mode 100644 index 00000000..2199f156 --- /dev/null +++ b/networking-workshop/env/terraform/network.tf @@ -0,0 +1,16 @@ +# Load Balancer configuration for GCE + +data "google_compute_network" "training" { + name = "default" +} + +resource "google_compute_firewall" "ssh" { + name = "${var.node_tag}-allow-ssh" + network = data.google_compute_network.training.self_link + + allow { + protocol = "tcp" + ports = ["22", "61822", "32009", "3009"] + } +} + diff --git a/networking-workshop/env/terraform/output.tf b/networking-workshop/env/terraform/output.tf new file mode 100644 index 00000000..c49ad3f6 --- /dev/null +++ b/networking-workshop/env/terraform/output.tf @@ -0,0 +1,28 @@ +output "node_1_private_ip" { + value = google_compute_instance.cluster_node.*.network_interface.0.network_ip[0] +} + +output "node_1_public_ip" { + value = google_compute_instance.cluster_node.*.network_interface.0.access_config.0.nat_ip[0] +} + +output "node_2_private_ip" { + value = google_compute_instance.cluster_node.*.network_interface.0.network_ip[1] +} + +output "node_2_public_ip" { + value = google_compute_instance.cluster_node.*.network_interface.0.access_config.0.nat_ip[1] +} + +output "node_3_private_ip" { + value = google_compute_instance.cluster_node.*.network_interface.0.network_ip[2] +} + +output "node_3_public_ip" { + value = google_compute_instance.cluster_node.*.network_interface.0.access_config.0.nat_ip[2] +} + +output "csv" { + value = "${google_compute_instance.cluster_node.*.network_interface.0.access_config.0.nat_ip[0]},${google_compute_instance.cluster_node.*.network_interface.0.network_ip[0]},${google_compute_instance.cluster_node.*.network_interface.0.access_config.0.nat_ip[1]},${google_compute_instance.cluster_node.*.network_interface.0.network_ip[1]},${google_compute_instance.cluster_node.*.network_interface.0.access_config.0.nat_ip[2]},${google_compute_instance.cluster_node.*.network_interface.0.network_ip[2]}" +} + diff --git a/networking-workshop/env/terraform/versions.tf b/networking-workshop/env/terraform/versions.tf new file mode 100644 index 00000000..d9b6f790 --- /dev/null +++ b/networking-workshop/env/terraform/versions.tf @@ -0,0 +1,3 @@ +terraform { + required_version = ">= 0.12" +} diff --git a/networking-workshop/firedrills/debug.yaml b/networking-workshop/firedrills/debug.yaml new file mode 100644 index 00000000..69a3d05a --- /dev/null +++ b/networking-workshop/firedrills/debug.yaml @@ -0,0 +1,33 @@ +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: debug + labels: + app: debug +spec: + selector: + matchLabels: + app: debug + template: + metadata: + labels: + app: debug + spec: + securityContext: + runAsUser: 0 + containers: + - name: debug + image: leader.telekube.local:5000/gravitational/debian-tall:stretch + command: ["/bin/sh", "-c", "sleep 365d"] + env: + - name: PATH + value: "/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:/rootfs/usr/local/bin:/rootfs/usr/local/sbin:/rootfs/usr/bin:/rootfs/usr/sbin:/rootfs/bin:/rootfs/sbin" + - name: LD_LIBRARY_PATH + value: "/rootfs/usr/lib/x86_64-linux-gnu" + volumeMounts: + - name: rootfs + mountPath: /rootfs + volumes: + - name: rootfs + hostPath: + path: / diff --git a/networking-workshop/gravity101/agent.yaml b/networking-workshop/gravity101/agent.yaml new file mode 100644 index 00000000..6a56bf58 --- /dev/null +++ b/networking-workshop/gravity101/agent.yaml @@ -0,0 +1,14 @@ +kind: user +version: v2 +metadata: + name: "agent@example.com" +spec: + type: "agent" + roles: ["@teleadmin"] +--- +kind: token +version: v2 +metadata: + name: "qwe123" +spec: + user: "agent@example.com" diff --git a/networking-workshop/gravity101/v1-simplest/app.yaml b/networking-workshop/gravity101/v1-simplest/app.yaml new file mode 100644 index 00000000..c44d652e --- /dev/null +++ b/networking-workshop/gravity101/v1-simplest/app.yaml @@ -0,0 +1,5 @@ +apiVersion: cluster.gravitational.io/v2 +kind: Cluster +metadata: + name: cluster-image + resourceVersion: 0.0.1 diff --git a/networking-workshop/gravity101/v1-with-base/app.yaml b/networking-workshop/gravity101/v1-with-base/app.yaml new file mode 100644 index 00000000..7d11e22b --- /dev/null +++ b/networking-workshop/gravity101/v1-with-base/app.yaml @@ -0,0 +1,6 @@ +apiVersion: cluster.gravitational.io/v2 +kind: Cluster +baseImage: gravity:7.0.30 +metadata: + name: cluster-image + resourceVersion: 0.0.1 diff --git a/networking-workshop/gravity101/v1-with-resources/app.yaml b/networking-workshop/gravity101/v1-with-resources/app.yaml new file mode 100644 index 00000000..7d11e22b --- /dev/null +++ b/networking-workshop/gravity101/v1-with-resources/app.yaml @@ -0,0 +1,6 @@ +apiVersion: cluster.gravitational.io/v2 +kind: Cluster +baseImage: gravity:7.0.30 +metadata: + name: cluster-image + resourceVersion: 0.0.1 diff --git a/networking-workshop/gravity101/v1-with-resources/charts/alpine/Chart.yaml b/networking-workshop/gravity101/v1-with-resources/charts/alpine/Chart.yaml new file mode 100644 index 00000000..7775284a --- /dev/null +++ b/networking-workshop/gravity101/v1-with-resources/charts/alpine/Chart.yaml @@ -0,0 +1,3 @@ +name: alpine +description: An Alpine 3.3 Linux deployment +version: 0.0.1 diff --git a/networking-workshop/gravity101/v1-with-resources/charts/alpine/templates/deployment.yaml b/networking-workshop/gravity101/v1-with-resources/charts/alpine/templates/deployment.yaml new file mode 100644 index 00000000..6304d457 --- /dev/null +++ b/networking-workshop/gravity101/v1-with-resources/charts/alpine/templates/deployment.yaml @@ -0,0 +1,24 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: alpine + labels: + app: alpine +spec: + replicas: 1 + selector: + matchLabels: + app: alpine + strategy: + type: Recreate + template: + metadata: + labels: + app: alpine + spec: + containers: + - name: alpine + image: "{{ .Values.registry }}alpine:{{ .Values.version }}" + command: ["/bin/sleep", "90000"] + securityContext: + runAsNonRoot: false diff --git a/networking-workshop/gravity101/v1-with-resources/charts/alpine/values.yaml b/networking-workshop/gravity101/v1-with-resources/charts/alpine/values.yaml new file mode 100644 index 00000000..3d2b2a44 --- /dev/null +++ b/networking-workshop/gravity101/v1-with-resources/charts/alpine/values.yaml @@ -0,0 +1,2 @@ +version: 3.3 +registry: diff --git a/networking-workshop/gravity101/v1/app.yaml b/networking-workshop/gravity101/v1/app.yaml new file mode 100644 index 00000000..66ee68de --- /dev/null +++ b/networking-workshop/gravity101/v1/app.yaml @@ -0,0 +1,9 @@ +apiVersion: cluster.gravitational.io/v2 +kind: Cluster +baseImage: gravity:7.0.30 +metadata: + name: cluster-image + resourceVersion: 0.0.1 +hooks: + install: + job: file://install.yaml diff --git a/networking-workshop/gravity101/v1/charts/alpine/Chart.yaml b/networking-workshop/gravity101/v1/charts/alpine/Chart.yaml new file mode 100644 index 00000000..7775284a --- /dev/null +++ b/networking-workshop/gravity101/v1/charts/alpine/Chart.yaml @@ -0,0 +1,3 @@ +name: alpine +description: An Alpine 3.3 Linux deployment +version: 0.0.1 diff --git a/networking-workshop/gravity101/v1/charts/alpine/templates/deployment.yaml b/networking-workshop/gravity101/v1/charts/alpine/templates/deployment.yaml new file mode 100644 index 00000000..6304d457 --- /dev/null +++ b/networking-workshop/gravity101/v1/charts/alpine/templates/deployment.yaml @@ -0,0 +1,24 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: alpine + labels: + app: alpine +spec: + replicas: 1 + selector: + matchLabels: + app: alpine + strategy: + type: Recreate + template: + metadata: + labels: + app: alpine + spec: + containers: + - name: alpine + image: "{{ .Values.registry }}alpine:{{ .Values.version }}" + command: ["/bin/sleep", "90000"] + securityContext: + runAsNonRoot: false diff --git a/networking-workshop/gravity101/v1/charts/alpine/values.yaml b/networking-workshop/gravity101/v1/charts/alpine/values.yaml new file mode 100644 index 00000000..3d2b2a44 --- /dev/null +++ b/networking-workshop/gravity101/v1/charts/alpine/values.yaml @@ -0,0 +1,2 @@ +version: 3.3 +registry: diff --git a/networking-workshop/gravity101/v1/install.yaml b/networking-workshop/gravity101/v1/install.yaml new file mode 100644 index 00000000..ccf73529 --- /dev/null +++ b/networking-workshop/gravity101/v1/install.yaml @@ -0,0 +1,24 @@ +apiVersion: batch/v1 +kind: Job +metadata: + name: install +spec: + template: + metadata: + name: install + namespace: default + spec: + restartPolicy: OnFailure + containers: + - name: install + image: quay.io/gravitational/debian-tall:stretch + command: + - /usr/local/bin/helm + - install + - /var/lib/gravity/resources/charts/alpine + - --set + - registry=registry.local:5000/ + - --name + - example + - --namespace + - default diff --git a/networking-workshop/gravity101/v2/app.yaml b/networking-workshop/gravity101/v2/app.yaml new file mode 100644 index 00000000..a8976a62 --- /dev/null +++ b/networking-workshop/gravity101/v2/app.yaml @@ -0,0 +1,11 @@ +apiVersion: cluster.gravitational.io/v2 +kind: Cluster +baseImage: gravity:7.0.30 +metadata: + name: cluster-image + resourceVersion: 0.0.2 +hooks: + install: + job: file://install.yaml + update: + job: file://upgrade.yaml diff --git a/networking-workshop/gravity101/v2/charts/alpine/Chart.yaml b/networking-workshop/gravity101/v2/charts/alpine/Chart.yaml new file mode 100644 index 00000000..66bf2775 --- /dev/null +++ b/networking-workshop/gravity101/v2/charts/alpine/Chart.yaml @@ -0,0 +1,3 @@ +name: alpine +description: An Alpine 3.4 Linux deployment +version: 0.0.2 diff --git a/networking-workshop/gravity101/v2/charts/alpine/templates/deployment.yaml b/networking-workshop/gravity101/v2/charts/alpine/templates/deployment.yaml new file mode 100644 index 00000000..6304d457 --- /dev/null +++ b/networking-workshop/gravity101/v2/charts/alpine/templates/deployment.yaml @@ -0,0 +1,24 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: alpine + labels: + app: alpine +spec: + replicas: 1 + selector: + matchLabels: + app: alpine + strategy: + type: Recreate + template: + metadata: + labels: + app: alpine + spec: + containers: + - name: alpine + image: "{{ .Values.registry }}alpine:{{ .Values.version }}" + command: ["/bin/sleep", "90000"] + securityContext: + runAsNonRoot: false diff --git a/networking-workshop/gravity101/v2/charts/alpine/values.yaml b/networking-workshop/gravity101/v2/charts/alpine/values.yaml new file mode 100644 index 00000000..ac975868 --- /dev/null +++ b/networking-workshop/gravity101/v2/charts/alpine/values.yaml @@ -0,0 +1,2 @@ +version: 3.4 +registry: diff --git a/networking-workshop/gravity101/v2/install.yaml b/networking-workshop/gravity101/v2/install.yaml new file mode 100644 index 00000000..ccf73529 --- /dev/null +++ b/networking-workshop/gravity101/v2/install.yaml @@ -0,0 +1,24 @@ +apiVersion: batch/v1 +kind: Job +metadata: + name: install +spec: + template: + metadata: + name: install + namespace: default + spec: + restartPolicy: OnFailure + containers: + - name: install + image: quay.io/gravitational/debian-tall:stretch + command: + - /usr/local/bin/helm + - install + - /var/lib/gravity/resources/charts/alpine + - --set + - registry=registry.local:5000/ + - --name + - example + - --namespace + - default diff --git a/networking-workshop/gravity101/v2/upgrade.yaml b/networking-workshop/gravity101/v2/upgrade.yaml new file mode 100644 index 00000000..4f818421 --- /dev/null +++ b/networking-workshop/gravity101/v2/upgrade.yaml @@ -0,0 +1,22 @@ +apiVersion: batch/v1 +kind: Job +metadata: + name: upgrade +spec: + template: + metadata: + name: upgrade + spec: + restartPolicy: OnFailure + containers: + - name: upgrade + image: quay.io/gravitational/debian-tall:stretch + command: + - /usr/local/bin/helm + - upgrade + - --set + - registry=registry.local:5000/ + - example + - /var/lib/gravity/resources/charts/alpine + - --namespace + - default diff --git a/networking-workshop/gravity101/v3/app.yaml b/networking-workshop/gravity101/v3/app.yaml new file mode 100644 index 00000000..ab9cacb2 --- /dev/null +++ b/networking-workshop/gravity101/v3/app.yaml @@ -0,0 +1,11 @@ +apiVersion: cluster.gravitational.io/v2 +kind: Cluster +baseImage: gravity:7.0.30 +metadata: + name: cluster-image + resourceVersion: 0.0.3 +hooks: + install: + job: file://install.yaml + update: + job: file://upgrade.yaml diff --git a/networking-workshop/gravity101/v3/charts/alpine/Chart.yaml b/networking-workshop/gravity101/v3/charts/alpine/Chart.yaml new file mode 100644 index 00000000..ada881b3 --- /dev/null +++ b/networking-workshop/gravity101/v3/charts/alpine/Chart.yaml @@ -0,0 +1,3 @@ +name: alpine +description: An Alpine 3.5 Linux deployment +version: 0.0.3 diff --git a/networking-workshop/gravity101/v3/charts/alpine/templates/deployment.yaml b/networking-workshop/gravity101/v3/charts/alpine/templates/deployment.yaml new file mode 100644 index 00000000..6304d457 --- /dev/null +++ b/networking-workshop/gravity101/v3/charts/alpine/templates/deployment.yaml @@ -0,0 +1,24 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: alpine + labels: + app: alpine +spec: + replicas: 1 + selector: + matchLabels: + app: alpine + strategy: + type: Recreate + template: + metadata: + labels: + app: alpine + spec: + containers: + - name: alpine + image: "{{ .Values.registry }}alpine:{{ .Values.version }}" + command: ["/bin/sleep", "90000"] + securityContext: + runAsNonRoot: false diff --git a/networking-workshop/gravity101/v3/charts/alpine/values.yaml b/networking-workshop/gravity101/v3/charts/alpine/values.yaml new file mode 100644 index 00000000..261b1470 --- /dev/null +++ b/networking-workshop/gravity101/v3/charts/alpine/values.yaml @@ -0,0 +1,2 @@ +version: 3.5 +registry: diff --git a/networking-workshop/gravity101/v3/install.yaml b/networking-workshop/gravity101/v3/install.yaml new file mode 100644 index 00000000..ccf73529 --- /dev/null +++ b/networking-workshop/gravity101/v3/install.yaml @@ -0,0 +1,24 @@ +apiVersion: batch/v1 +kind: Job +metadata: + name: install +spec: + template: + metadata: + name: install + namespace: default + spec: + restartPolicy: OnFailure + containers: + - name: install + image: quay.io/gravitational/debian-tall:stretch + command: + - /usr/local/bin/helm + - install + - /var/lib/gravity/resources/charts/alpine + - --set + - registry=registry.local:5000/ + - --name + - example + - --namespace + - default diff --git a/networking-workshop/gravity101/v3/upgrade.yaml b/networking-workshop/gravity101/v3/upgrade.yaml new file mode 100644 index 00000000..4f818421 --- /dev/null +++ b/networking-workshop/gravity101/v3/upgrade.yaml @@ -0,0 +1,22 @@ +apiVersion: batch/v1 +kind: Job +metadata: + name: upgrade +spec: + template: + metadata: + name: upgrade + spec: + restartPolicy: OnFailure + containers: + - name: upgrade + image: quay.io/gravitational/debian-tall:stretch + command: + - /usr/local/bin/helm + - upgrade + - --set + - registry=registry.local:5000/ + - example + - /var/lib/gravity/resources/charts/alpine + - --namespace + - default diff --git a/networking-workshop/img/containers.png b/networking-workshop/img/containers.png new file mode 100644 index 00000000..5aa0f7cf Binary files /dev/null and b/networking-workshop/img/containers.png differ diff --git a/networking-workshop/img/gravity-node.png b/networking-workshop/img/gravity-node.png new file mode 100644 index 00000000..ec3c6e59 Binary files /dev/null and b/networking-workshop/img/gravity-node.png differ diff --git a/networking-workshop/img/gravity_networking_01.png b/networking-workshop/img/gravity_networking_01.png new file mode 100644 index 00000000..289e5460 Binary files /dev/null and b/networking-workshop/img/gravity_networking_01.png differ diff --git a/networking-workshop/img/gravity_networking_02.png b/networking-workshop/img/gravity_networking_02.png new file mode 100644 index 00000000..20945cda Binary files /dev/null and b/networking-workshop/img/gravity_networking_02.png differ diff --git a/networking-workshop/img/gravity_networking_03.png b/networking-workshop/img/gravity_networking_03.png new file mode 100644 index 00000000..57b684e2 Binary files /dev/null and b/networking-workshop/img/gravity_networking_03.png differ diff --git a/networking-workshop/img/gravity_networking_04.png b/networking-workshop/img/gravity_networking_04.png new file mode 100644 index 00000000..018f4da6 Binary files /dev/null and b/networking-workshop/img/gravity_networking_04.png differ diff --git a/networking-workshop/img/gravity_networking_05.png b/networking-workshop/img/gravity_networking_05.png new file mode 100644 index 00000000..a56ccc2e Binary files /dev/null and b/networking-workshop/img/gravity_networking_05.png differ diff --git a/networking-workshop/img/gravity_networking_06.png b/networking-workshop/img/gravity_networking_06.png new file mode 100644 index 00000000..d1e28599 Binary files /dev/null and b/networking-workshop/img/gravity_networking_06.png differ diff --git a/networking-workshop/img/gravity_networking_07.png b/networking-workshop/img/gravity_networking_07.png new file mode 100644 index 00000000..6bf5f2a6 Binary files /dev/null and b/networking-workshop/img/gravity_networking_07.png differ diff --git a/networking-workshop/img/gravity_networking_08.png b/networking-workshop/img/gravity_networking_08.png new file mode 100644 index 00000000..d53c43f4 Binary files /dev/null and b/networking-workshop/img/gravity_networking_08.png differ diff --git a/networking-workshop/img/gravity_networking_09.png b/networking-workshop/img/gravity_networking_09.png new file mode 100644 index 00000000..55037d51 Binary files /dev/null and b/networking-workshop/img/gravity_networking_09.png differ diff --git a/networking-workshop/img/gravity_networking_10.png b/networking-workshop/img/gravity_networking_10.png new file mode 100644 index 00000000..7c814ee9 Binary files /dev/null and b/networking-workshop/img/gravity_networking_10.png differ diff --git a/networking-workshop/img/gravity_networking_11.png b/networking-workshop/img/gravity_networking_11.png new file mode 100644 index 00000000..03212f46 Binary files /dev/null and b/networking-workshop/img/gravity_networking_11.png differ diff --git a/networking-workshop/img/gravity_networking_12.png b/networking-workshop/img/gravity_networking_12.png new file mode 100644 index 00000000..e4c4ddcc Binary files /dev/null and b/networking-workshop/img/gravity_networking_12.png differ diff --git a/networking-workshop/img/logrange.png b/networking-workshop/img/logrange.png new file mode 100644 index 00000000..105fe3b2 Binary files /dev/null and b/networking-workshop/img/logrange.png differ diff --git a/networking-workshop/img/macos-docker-settings.jpg b/networking-workshop/img/macos-docker-settings.jpg new file mode 100644 index 00000000..6f2e3d88 Binary files /dev/null and b/networking-workshop/img/macos-docker-settings.jpg differ diff --git a/networking-workshop/img/mattermost.png b/networking-workshop/img/mattermost.png new file mode 100644 index 00000000..a1b65708 Binary files /dev/null and b/networking-workshop/img/mattermost.png differ diff --git a/networking-workshop/ingress.md b/networking-workshop/ingress.md new file mode 100644 index 00000000..ff21e148 --- /dev/null +++ b/networking-workshop/ingress.md @@ -0,0 +1,105 @@ +### Ingress + +*Preparation: ingress can be enabled on already running minikube using command:* + +``` +minikube addons enable ingress +``` + +An Ingress is a collection of rules that allow inbound connections to reach the cluster services. +It can be configured to give services externally-reachable urls, load balance traffic, terminate SSL, offer name based virtual hosting etc. +The difference between service and ingress (in K8S terminology) is that service allows you to provide access on OSI L3, and ingress +works on L7. E.g. while accessing HTTP server service can provide only load-balancing and HA, unlike ingres which could be used to split +traffic on HTTP location basis, etc. + +First, we need to create to 2 different nginx deployments, configmaps and services for them: + +``` +kubectl create configmap cola-nginx --from-file=ingress/conf-cola +kubectl create configmap pepsi-nginx --from-file=ingress/conf-pepsi +kubectl apply -f ingress/cola-nginx-configmap.yaml -f ingress/pepsi-nginx-configmap.yaml +kubectl apply -f ingress/cola-nginx-service.yaml -f ingress/pepsi-nginx-service.yaml +``` + +Check if both deployments and services works: + +``` +$ curl $(minikube service cola-nginx --url) +Taste The Feeling. Coca-Cola. +$ curl $(minikube service pepsi-nginx --url) +Every Pepsi Refreshes The World. +``` + +Example ingress usage pattern is to route HTTP traffic according to location. +Now we have two different deployments and services, assume we need to route user +requests from `/cola` to `cola-nginx` service (backed by `cola-nginx` deployment) +and `/pepsi` to `pepsi-nginx` service. + +This can be acheived using following ingress resource: + +```yaml +apiVersion: extensions/v1beta1 +kind: Ingress +metadata: + name: drinks-ingress + annotations: + ingress.kubernetes.io/rewrite-target: / + ingress.kubernetes.io/ssl-redirect: "false" +spec: + rules: + - http: + paths: + - path: /cola + backend: + serviceName: cola-nginx + servicePort: 80 + - path: /pepsi + backend: + serviceName: pepsi-nginx + servicePort: 80 +``` + +Create ingress: + +``` +kubectl apply -f ingress/drinks-ingress.yaml +``` + +Notice annotations: + +* `ingress.kubernetes.io/rewrite-target: /` -- sets request's location to `/` instead of specified in `path`. +* `ingress.kubernetes.io/ssl-redirect: "false"` -- disables HTTP to HTTPS redirect, enabled by default. + +Ingress is implemented inside `kube-system` namespace using any kind of configurable proxy. E.g. in minikube +ingress uses nginx. Simply speaking there's special server which reacts to ingress resource creation/deletion/alteration +and updates configuration of neighboured nginx. This *ingress controller* application started using +ReplicationController resource inside minikube, but could be run as usual K8S application (DS, Deployment, etc), +on special set of "edge router" nodes for improved security. + +``` +$ kubectl --namespace=kube-system get pods -l app=nginx-ingress-lb +NAME READY STATUS RESTARTS AGE +nginx-ingress-controller-1nzsp 1/1 Running 0 1h +``` + +Now we can make ingress reachable to outer world (e.g. our local host). It's not required, you're free of choice +to make it reachable only internally or via some cloud-provider using LoadBalancer. + +``` +kubectl --namespace=kube-system expose rc nginx-ingress-controller --port=80 --type=LoadBalancer +``` + +Finally we can check location splitting via hitting ingress-controller service with +proper location. + +``` +$ curl $(minikube service --namespace=kube-system nginx-ingress-controller --url)/cola +Taste The Feeling. Coca-Cola. +$ curl $(minikube service --namespace=kube-system nginx-ingress-controller --url)/pepsi +Every Pepsi Refreshes The World. +``` + +As you see, we're hitting one service with different locations and have different responses due +to ingress location routing. + +More details on ingress features and use cases [here](https://kubernetes.io/docs/user-guide/ingress/). diff --git a/networking-workshop/ingress/cola-nginx-configmap.yaml b/networking-workshop/ingress/cola-nginx-configmap.yaml new file mode 100644 index 00000000..4cbcdcb7 --- /dev/null +++ b/networking-workshop/ingress/cola-nginx-configmap.yaml @@ -0,0 +1,27 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + app: cola-nginx + name: cola-nginx + namespace: default +spec: + replicas: 1 + template: + metadata: + labels: + app: cola-nginx + spec: + containers: + - image: nginx:1.11.5 + name: cola-nginx + ports: + - containerPort: 80 + protocol: TCP + volumeMounts: + - name: config-volume + mountPath: /etc/nginx/conf.d + volumes: + - name: config-volume + configMap: + name: cola-nginx diff --git a/networking-workshop/ingress/cola-nginx-service.yaml b/networking-workshop/ingress/cola-nginx-service.yaml new file mode 100644 index 00000000..efd30d93 --- /dev/null +++ b/networking-workshop/ingress/cola-nginx-service.yaml @@ -0,0 +1,13 @@ +apiVersion: v1 +kind: Service +metadata: + name: cola-nginx + labels: + app: cola-nginx +spec: + type: NodePort + ports: + - port: 80 + name: http + selector: + app: cola-nginx diff --git a/networking-workshop/ingress/conf-cola/default.conf b/networking-workshop/ingress/conf-cola/default.conf new file mode 100644 index 00000000..1c6a3cc2 --- /dev/null +++ b/networking-workshop/ingress/conf-cola/default.conf @@ -0,0 +1,8 @@ +server { + listen 80; + server_name localhost; + + location / { + return 200 'Taste The Feeling. Coca-Cola.\n'; + } +} diff --git a/networking-workshop/ingress/conf-pepsi/default.conf b/networking-workshop/ingress/conf-pepsi/default.conf new file mode 100644 index 00000000..ed20c6d2 --- /dev/null +++ b/networking-workshop/ingress/conf-pepsi/default.conf @@ -0,0 +1,8 @@ +server { + listen 80; + server_name localhost; + + location / { + return 200 'Every Pepsi Refreshes The World.\n'; + } +} diff --git a/networking-workshop/ingress/drinks-ingress.yaml b/networking-workshop/ingress/drinks-ingress.yaml new file mode 100644 index 00000000..2f8e62f8 --- /dev/null +++ b/networking-workshop/ingress/drinks-ingress.yaml @@ -0,0 +1,19 @@ +apiVersion: apps/v1 +kind: Ingress +metadata: + name: drinks-ingress + annotations: + ingress.kubernetes.io/rewrite-target: / + ingress.kubernetes.io/ssl-redirect: "false" +spec: + rules: + - http: + paths: + - path: /cola + backend: + serviceName: cola-nginx + servicePort: 80 + - path: /pepsi + backend: + serviceName: pepsi-nginx + servicePort: 80 diff --git a/networking-workshop/ingress/pepsi-nginx-configmap.yaml b/networking-workshop/ingress/pepsi-nginx-configmap.yaml new file mode 100644 index 00000000..c7f44424 --- /dev/null +++ b/networking-workshop/ingress/pepsi-nginx-configmap.yaml @@ -0,0 +1,27 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + app: pepsi-nginx + name: pepsi-nginx + namespace: default +spec: + replicas: 1 + template: + metadata: + labels: + app: pepsi-nginx + spec: + containers: + - image: nginx:1.11.5 + name: pepsi-nginx + ports: + - containerPort: 80 + protocol: TCP + volumeMounts: + - name: config-volume + mountPath: /etc/nginx/conf.d + volumes: + - name: config-volume + configMap: + name: pepsi-nginx diff --git a/networking-workshop/ingress/pepsi-nginx-service.yaml b/networking-workshop/ingress/pepsi-nginx-service.yaml new file mode 100644 index 00000000..d742cfa2 --- /dev/null +++ b/networking-workshop/ingress/pepsi-nginx-service.yaml @@ -0,0 +1,13 @@ +apiVersion: v1 +kind: Service +metadata: + name: pepsi-nginx + labels: + app: pepsi-nginx +spec: + type: NodePort + ports: + - port: 80 + name: http + selector: + app: pepsi-nginx diff --git a/networking-workshop/k8s101.md b/networking-workshop/k8s101.md new file mode 100644 index 00000000..4c08902e --- /dev/null +++ b/networking-workshop/k8s101.md @@ -0,0 +1,922 @@ +# Kubernetes 101 + +Kubernetes 101 workshop - introduction to Kubernetes basic concepts + +## Installation + +First, follow the [installation instructions](README.md#installation). + +## Running nginx + +Everyone says that Kubernetes (sometimes abbreviated as K8S) is hard, however going through this workshop we'll prove that it shouldn't be that way! + +Let's start by creating an `nginx` service. + +```bash +$ kubectl create deployment my-nginx --image=nginx --replicas=2 --port=80 +$ kubectl expose deployment my-nginx --type=LoadBalancer --port=80 +``` + +Let's go step by step and explore what just happened: + +## Pods +[Pods](http://kubernetes.io/docs/user-guide/pods/) are one of the building blocks of Kubernetes architecture. + +In essence this is a group of containers sharing the same networking and Linux namespaces. They are used to group related processes together. Our `run` command resulted in several running pods: + +```bash +$ kubectl get pods + +NAME READY STATUS RESTARTS AGE +my-nginx-3800858182-auusv 1/1 Running 0 32m +my-nginx-3800858182-jzoxe 1/1 Running 0 32m +``` + +You can explore individual pods or group of pods using handy `kubectl describe` + +```bash +$ kubectl describe pods + +Name: my-nginx-3800858182-auusv +Namespace: default +Node: 172.28.128.5/172.28.128.5 +Start Time: Sun, 15 May 2016 19:37:01 +0000 +Labels: pod-template-hash=3800858182,run=my-nginx +Status: Running +IP: 10.244.33.109 +Controllers: ReplicaSet/my-nginx-3800858182 +Containers: + my-nginx: + Container ID: docker://f322f42081024e8374d23765652d3abc4cb1f28d3cfd4ed37a7dd0c990c12c5f + Image: nginx + Image ID: docker://44d8b6f34ba13fdbf1da947d4bc6467eadae1cc84c2090011803f7b0862ea124 + Port: 80/TCP + QoS Tier: + cpu: BestEffort + memory: BestEffort + State: Running + Started: Sun, 15 May 2016 19:37:36 +0000 + Ready: True + Restart Count: 0 + Environment Variables: +Conditions: + Type Status + Ready True +Volumes: + default-token-8n3l2: + Type: Secret (a volume populated by a Secret) + SecretName: default-token-8n3l2 +Events: + FirstSeen LastSeen Count From SubobjectPath Type Reason Message + --------- -------- ----- ---- ------------- -------- ------ ------- + 33m 33m 1 {default-scheduler } Normal Scheduled Successfully assigned my-nginx-3800858182-auusv to 172.28.128.5 + 33m 33m 1 {kubelet 172.28.128.5} spec.containers{my-nginx} Normal Pulling pulling image "nginx" + 32m 32m 1 {kubelet 172.28.128.5} spec.containers{my-nginx} Normal Pulled Successfully pulled image "nginx" + 32m 32m 1 {kubelet 172.28.128.5} spec.containers{my-nginx} Normal Created Created container with docker id f322f4208102 + 32m 32m 1 {kubelet 172.28.128.5} spec.containers{my-nginx} Normal Started Started container with docker id f322f4208102 +``` + +Now let's focus on what's inside one pod. + +### Pod IPs + +In the Pods description output you should be able to spot the field `IP` in the overlay network assigned to the pod. +In the example above it's `10.244.33.109`. Can we access it by using that IP directly? + +Let's temporarily (`--rm`) run a Pod providing `curl` to verify if we can access the `nginx` Pod from other pods: + +```bash +$ kubectl run -it --rm cli --image=appropriate/curl --restart=Never /bin/sh +$ curl http://10.244.33.109 + + + +Welcome to nginx! + + + +

Welcome to nginx!

+

If you see this page, the nginx web server is successfully installed and +working. Further configuration is required.

+``` + +It works! Wait, so will you need to hardcode this VIP in your configuration? What if it changes from environment to environment? +Thankfully, K8s team thought about this as well, and we can simply do: + +```bash +$ kubectl run -i -t --rm cli --image=appropriate/curl --restart=Never /bin/sh +curl http://my-nginx + +... +``` + +K8s uses a [CoreDNS](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns) service +that watches the services and pods and sets up appropriate `A` records. Our `sandbox` local DNS server is simply configured to point to the DNS service provided by K8s. + +That's very similar how K8s manages discovery in containers as well. Let's login into one of the nginx boxes and +discover `/etc/resolv.conf` there: + + +```bash +$ kubectl exec -ti my-nginx-3800858182-auusv -- /bin/bash +root@my-nginx-3800858182-auusv:/# cat /etc/resolv.conf + +nameserver 10.100.0.4 +search default.svc.cluster.local svc.cluster.local cluster.local hsd1.ca.comcast.net +options ndots:5 +``` + +`resolv.conf` is set up to point to the DNS resolution service managed by K8s. + +## Back to Deployments + +The power of Deployments comes from ability to run smart upgrades and rollbacks in case if something goes wrong. + +Let's update our deployment of nginx to the newer version. + +```bash +$ cat my-nginx-new.yaml +``` + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + run: my-nginx + name: my-nginx + namespace: default +spec: + replicas: 2 + selector: + matchLabels: + run: my-nginx + template: + metadata: + labels: + run: my-nginx + spec: + containers: + - image: nginx:1.17.5 + name: my-nginx + ports: + - containerPort: 80 + protocol: TCP +``` + +Let's apply our deployment: + +```bash +$ kubectl apply -f my-nginx-new.yaml --record +``` + +We can see that a new ReplicaSet has been created: + +```bash +$ kubectl get rs + +NAME DESIRED CURRENT AGE +my-nginx-1413250935 2 2 50s +my-nginx-3800858182 0 0 2h +``` + +If we look at the events section of the deployment we will see how it performed rolling update scaling up new ReplicaSet while scaling down the old one: + + +```bash +$ kubectl describe deployments/my-nginx +Name: my-nginx +Namespace: default +CreationTimestamp: Sun, 15 May 2016 19:37:01 +0000 +Labels: run=my-nginx +Selector: run=my-nginx +Replicas: 2 updated | 2 total | 2 available | 0 unavailable +StrategyType: RollingUpdate +MinReadySeconds: 0 +RollingUpdateStrategy: 1 max unavailable, 1 max surge +OldReplicaSets: +NewReplicaSet: my-nginx-1413250935 (2/2 replicas created) +Events: + FirstSeen LastSeen Count From SubobjectPath Type Reason Message + --------- -------- ----- ---- ------------- -------- ------ ------- + 2h 2h 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set my-nginx-3800858182 to 2 + 1m 1m 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set my-nginx-1413250935 to 1 + 1m 1m 1 {deployment-controller } Normal ScalingReplicaSet Scaled down replica set my-nginx-3800858182 to 1 + 1m 1m 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set my-nginx-1413250935 to 2 + 1m 1m 1 {deployment-controller } Normal ScalingReplicaSet Scaled down replica set my-nginx-3800858182 to 0 +``` + +And now its version is `1.17.5`. Let's check out in the headers: + +```bash +$ kubectl run -i -t --rm cli --image=appropriate/curl --restart=Never /bin/sh +curl -v http://my-nginx + +* About to connect() to my-nginx port 80 (#0) +* Trying 10.100.68.75... +* Connected to my-nginx (10.100.68.75) port 80 (#0) +> GET / HTTP/1.1 +> User-Agent: curl/7.29.0 +> Host: my-nginx +> Accept: */* +> +< HTTP/1.1 200 OK +< Server: nginx/1.17.5 +``` + +Let's simulate a situation when a deployment fails and we need to rollback. Our deployment has a typo: + +```bash +cat my-nginx-typo.yaml +``` + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + run: my-nginx + name: my-nginx + namespace: default +spec: + replicas: 2 + selector: + matchLabels: + run: my-nginx + template: + metadata: + labels: + run: my-nginx + spec: + containers: + - image: nginx:999 # <-- TYPO: non-existent version + name: my-nginx + ports: + - containerPort: 80 + protocol: TCP +``` + +Let's apply the broken YAML: + +```shell +$ kubectl apply -f my-nginx-typo.yaml --record +deployment "my-nginx" configured +``` + +Our new pods have crashed: + +```bash +$ kubectl get pods +NAME READY STATUS RESTARTS AGE +my-nginx-1413250935-rqstg 1/1 Running 0 10m +my-nginx-2896527177-8wmk7 0/1 ImagePullBackOff 0 55s +my-nginx-2896527177-cv3fd 0/1 ImagePullBackOff 0 55s +``` + +Our deployment shows 2 unavailable replicas: + +```bash +$ kubectl describe deployments/my-nginx +Name: my-nginx +Namespace: default +CreationTimestamp: Sun, 15 May 2016 19:37:01 +0000 +Labels: run=my-nginx +Selector: run=my-nginx +Replicas: 2 updated | 2 total | 1 available | 2 unavailable +StrategyType: RollingUpdate +MinReadySeconds: 0 +RollingUpdateStrategy: 1 max unavailable, 1 max surge +OldReplicaSets: my-nginx-1413250935 (1/1 replicas created) +NewReplicaSet: my-nginx-2896527177 (2/2 replicas created) +Events: + FirstSeen LastSeen Count From SubobjectPath Type Reason Message + --------- -------- ----- ---- ------------- -------- ------ ------- + 2h 2h 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set my-nginx-3800858182 to 2 + 11m 11m 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set my-nginx-1413250935 to 1 + 11m 11m 1 {deployment-controller } Normal ScalingReplicaSet Scaled down replica set my-nginx-3800858182 to 1 + 11m 11m 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set my-nginx-1413250935 to 2 + 10m 10m 1 {deployment-controller } Normal ScalingReplicaSet Scaled down replica set my-nginx-3800858182 to 0 + 1m 1m 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set my-nginx-2896527177 to 1 + 1m 1m 1 {deployment-controller } Normal ScalingReplicaSet Scaled down replica set my-nginx-1413250935 to 1 + 1m 1m 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set my-nginx-2896527177 to 2 +``` + +Our rollout has stopped. Let's view the history: + +```bash +$ kubectl rollout history deployments/my-nginx +deployments "my-nginx": +REVISION CHANGE-CAUSE +1 kubectl run my-nginx --image=nginx --replicas=2 --port=80 --expose --record +2 kubectl apply -f my-nginx-new.yaml +3 kubectl apply -f my-nginx-typo.yaml +``` + +**Note:** We used `--record` flag so that all commands are recorded + +Let's roll back the last deployment: + +```bash +$ kubectl rollout undo deployment/my-nginx +``` + +We've rolled back and created a new revision by doing `undo`: + +```bash +$ kubectl rollout history deployment/my-nginx +deployments "my-nginx": +REVISION CHANGE-CAUSE +1 kubectl run my-nginx --image=nginx --replicas=2 --port=80 --expose --record +3 kubectl apply -f my-nginx-typo.yaml +4 kubectl apply -f my-nginx-new.yaml +``` + +[Deployments](http://kubernetes.io/docs/user-guide/deployments/) are a very powerful tool, and we've barely scratched the surface of what they can do. Check out [docs](http://kubernetes.io/docs/user-guide/deployments/) for more detail. + +## Configuration management basics + +Our `nginx` Pods are up and running, let's make sure they actually do something useful by configuring them to say `hello, kubernetes!` + +[ConfigMaps](http://kubernetes.io/docs/user-guide/configmap/) are a special K8s resource that allows configuration files or environment variables to be used inside Pods. + +Lets create a new configmap from a directory. Our `conf.d` contains a `default.conf` file: + +```bash +$ cat conf.d/default.conf +server { + listen 80; + server_name localhost; + + location / { + return 200 'hello, Kubernetes!'; + } +} +``` + +We can convert the whole directory into configmap: + +```bash +$ kubectl create configmap my-nginx-v1 --from-file=conf.d +configmap "my-nginx-v1" created +``` + +```bash +$ kubectl describe configmaps/my-nginx-v1 +Name: my-nginx-v1 +Namespace: default +Labels: +Annotations: + +Data +==== +default.conf: 125 bytes + +``` + +Every file is now its own property, e.g. `default.conf`. Now the trick is to mount this config map in the `/etc/nginx/conf.d/` of our nginx Pods. We will use a new deployment for this purpose: + + +```bash +$ cat my-nginx-configmap.yaml +``` + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + run: my-nginx + name: my-nginx + namespace: default +spec: + replicas: 2 + selector: + matchLabels: + run: my-nginx + template: + metadata: + labels: + run: my-nginx + spec: + containers: + - image: nginx:1.17.5 + name: my-nginx + ports: + - containerPort: 80 + protocol: TCP + volumeMounts: + - name: config-volume + mountPath: /etc/nginx/conf.d + volumes: + - name: config-volume + configMap: + name: my-nginx-v1 +``` + +Notice that we've introduced a `volumes` section that tells k8s to attach volumes to the pods. +One special volume type we support is `configMap` that is created on the fly from the configmap resource `my-nginx-v1` that we've just created. + +Another part of our config is `volumeMounts` that are specified for each container and tell it where to mount the volume. + +Let's apply our config map: + +```bash +$ kubectl apply -f my-nginx-configmap.yaml +``` + +Listing Pods you'll see that new one using the updates deployment have just been automatically created: + +```bash +$ kubectl get pods +NAME READY STATUS RESTARTS AGE +my-nginx-3885498220-0c6h0 1/1 Running 0 39s +my-nginx-3885498220-9q61s 1/1 Running 0 38s +``` + +Out of curiosity, let's login into one of them and see ourselves the mounted configmap: + +```bash +$ kubectl exec -ti my-nginx-3885498220-0c6h0 /bin/bash +cat /etc/nginx/conf.d/default.conf +server { + listen 80; + server_name localhost; + + location / { + return 200 'hello, Kubernetes!'; + } +} +``` + +and finally, let's see it all in action: + +```bash +$ kubectl run -i -t --rm cli --image=appropriate/curl --restart=Never /bin/sh +curl http://my-nginx +hello, Kubernetes! +``` + +## Connecting services + +Let's deploy a bit more complicated stack. In this exercise we will deploy [Mattermost](http://www.mattermost.org) - an alternative to Slack that can run on your infrastructure. + +We will go through the process of building our own containers and configuration and pushing it to the registry. + +The Mattermost stack is composed of a worker process that connects to a running PostgresSQL instance. + +### Build container + +Let's build a container image for our worker and push it to our local private registry: + +```bash +$ export registry="$(kubectl get svc/registry -ojsonpath='{.spec.clusterIP}'):5000" +$ eval $(minikube docker-env) +$ docker build -t $registry/mattermost-worker:latest mattermost/worker +$ docker push $registry/mattermost-worker +``` + +**Note:** Notice the `$registry` prefix. This is a private registry we've set up on our master server as explained in [README.md](https://github.com/gravitational/workshop/blob/master/README.md) + +**Create configmap** + +Mattermost's worker expects configuration to be mounted at: + +`/var/mattermost/config/config.json` + +```bash +$ cat mattermost/worker-config/config.json +``` + +If we examine config closely, we will notice that mattermost expects a connector string to PostgresSQL: + +```yaml + "DataSource": "postgres://postgres:mattermost@postgres:5432/postgres?sslmode=disable" + "DataSourceReplicas": ["postgres://postgres:mattermost@postgres:5432/postgres?sslmode=disable"] +``` + +Here's where k8s power comes into play. We don't need to provide hardcoded IPs, we can simply make sure that there's a `postgres` service pointing to our PostgresSQL DB running somewhere in the cluster. + +Let us create config map based on this file: + +```bash +$ kubectl create configmap mattermost-v1 --from-file=mattermost/worker-config +$ kubectl describe configmaps/mattermost-v1 +Name: mattermost-v1 +Namespace: default +Labels: +Annotations: + +Data +==== +config.json: 2951 bytes +``` + +### Starting Up Postgres + +Let's create a single Pod running PostgresSQL and point our service to it: + +```bash +$ kubectl create -f mattermost/postgres.yaml +$ kubectl get pods +NAME READY STATUS RESTARTS AGE +mattermost-database 1/1 Running 0 12m +``` + +Let's check out the logs of our postgres: + +```bash +kubectl logs mattermost-database +The files belonging to this database system will be owned by user "postgres". +This user must also own the server process. + +The database cluster will be initialized with locale "en_US.utf8". +The default database encoding has accordingly been set to "UTF8". +The default text search configuration will be set to "english". + +Data page checksums are disabled. + +fixing permissions on existing directory /var/lib/postgresql/data ... ok +creating subdirectories ... ok +selecting default max_connections ... 100 +selecting default shared_buffers ... 128MB +``` + +**Note** Our `mattermost-database` is a special snowflake, in real production systems we must create a proper replicaset for the stateful service, what is slightly more complicated than this example. + + +### Creating Postgres Service + +Let's create PostgresSQL service: + +```bash +$ kubectl create -f mattermost/postgres-service.yaml +``` + +Let's check out that everything is alright: + +```bash +$ kubectl describe svc/postgres +Name: postgres +Namespace: default +Labels: app=mattermost,role=mattermost-database +Selector: role=mattermost-database +Type: NodePort +IP: 10.100.41.153 +Port: 5432/TCP +NodePort: 31397/TCP +Endpoints: 10.244.40.229:5432 +Session Affinity: None +``` + +Seems like an IP has been correctly allocated and endpoints have been found. + +### Creating Mattermost worker deployment + + +```bash +$ cat mattermost/worker.yaml +``` + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + app: mattermost + role: mattermost-worker + name: mattermost-worker + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + role: mattermost-worker + template: + metadata: + labels: + app: mattermost + role: mattermost-worker + spec: + containers: + - image: __REGISTRY_IP__/mattermost-worker:5.21.0 + name: mattermost-worker + ports: + - containerPort: 80 + protocol: TCP + volumeMounts: + - name: config-volume + mountPath: /var/mattermost/config + volumes: + - name: config-volume + configMap: + name: mattermost-v1 +``` + +The following command is just a fancy one-liner to insert the value of $registry +in your `kubectl` command and use it on the fly. + +```bash +$ cat mattermost/worker.yaml | sed "s/__REGISTRY_IP__/$registry/g" | kubectl create --record -f - +``` + +Let's check out the status of the deployment to double-check that part too: + +```bash +$ kubectl describe deployments/mattermost-worker +Name: mattermost-worker +Namespace: default +CreationTimestamp: Sun, 15 May 2016 23:56:57 +0000 +Labels: app=mattermost,role=mattermost-worker +Selector: role=mattermost-worker +Replicas: 1 updated | 1 total | 1 available | 0 unavailable +StrategyType: RollingUpdate +MinReadySeconds: 0 +RollingUpdateStrategy: 1 max unavailable, 1 max surge +OldReplicaSets: +NewReplicaSet: mattermost-worker-1848122701 (1/1 replicas created) +Events: + FirstSeen LastSeen Count From SubobjectPath Type Reason Message + --------- -------- ----- ---- ------------- -------- ------ ------- + 3m 3m 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set mattermost-worker-1932270926 to 1 + 1m 1m 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set mattermost-worker-1848122701 to 1 + 1m 1m 1 {deployment-controller } Normal ScalingReplicaSet Scaled down replica set mattermost-worker-1932270926 to 0 +``` + +### Creating mattermost service + +Our last touch is to create the Mattermost service and verify that it's all working as correctly: + +```bash +$ kubectl create -f mattermost/worker-service.yaml +You have exposed your service on an external port on all nodes in your +cluster. If you want to expose this service to the external internet, you may +need to set up firewall rules for the service port(s) (tcp:32321) to serve traffic. + +See http://releases.k8s.io/release-1.2/docs/user-guide/services-firewalls.md for more details. +service "mattermost" created +``` + +Let's inspect the service spec: + +```bash +$ cat mattermost/worker-service.yaml +``` + +Here's what we got. Notice `NodePort` service type: + +```yaml +# service for web worker +apiVersion: v1 +kind: Service +metadata: + name: mattermost + labels: + app: mattermost + role: mattermost-worker +spec: + type: NodePort + ports: + - port: 80 + name: http + selector: + role: mattermost-worker +``` + +`NodePort` service type exposes a static port on every node in the cluster. In this case this port +is `32321`. This is handy sometimes when you are working on-prem or locally. + +### Accessing the installation + +```bash +$ kubectl run -i -t --rm cli --image=appropriate/curl --restart=Never /bin/sh +curl http://mattermost + + + + + + + + + + Mattermost - Signup +``` + +Okay, okay, we need to actually access the website now. Well, that' when `NodePort` comes in handy. +Let's view it a bit closer: + +```bash +$ kubectl describe svc/mattermost +Name: mattermost +Namespace: default +Labels: app=mattermost,role=mattermost-worker +Selector: role=mattermost-worker +Type: NodePort +IP: 172.28.128.4 +Port: http 80/TCP +NodePort: http 32321/TCP +Endpoints: 10.244.40.23:80 +Session Affinity: None +``` + +Please notice that: + +``` +NodePort: http 32321/TCP +``` + +Here we see that on our environment we should be able to connect to Mattermost by using `IP:32321` but on your system this port will most likely be different! + +So on my computer, I can now open mattermost app using one of the nodes IP: + + +![mattermost](img/mattermost.png) + +!!! MINIKUBE users: use `minikube tunnel` to fetch the IP address of the VM hosting Minikube. When connecting via your browser, you'll need to use the IP of the VM that's in the same subnet of your host. Combine that IP with the NodePort above. + +## Recap + +We've learned several quite important concepts like Services, Pods, ReplicaSets and +Configmaps. But that's just a small part of what Kubernetes can do. Read more on [Kubernetes portal](http://kubernetes.io) diff --git a/networking-workshop/k8snet.md b/networking-workshop/k8snet.md new file mode 100644 index 00000000..04419762 --- /dev/null +++ b/networking-workshop/k8snet.md @@ -0,0 +1,302 @@ +# Kubernetes networking explained + +Brief description of Kubernetes networking internals + +## Motivation + +Unlike classic Docker networking which uses simple bridge to interconnect +containers and port-forwarding (proxy or NAT) K8S has addition requirements on +network fabric: + +1. all containers can communicate with all other containers without NAT +2. all nodes can communicate with all containers (and vice-versa) without NAT +3. the IP that a container sees itself as is the same IP that others see it as + +Obviously classic Docker network setup unable to provide this features in case +of more than one node. + +K8S itself doesn't have built-in networking solutions (except for one-node case) +to fulfil this requirements. But there're a lot of 3rd-parti solutions to +implement network fabric. Every product has it's pros and cons, so it was a good +idea to make networking fabric independent and pluggable. + +More about K8S networking solutions [here][1]. + +For this purposes K8S supports CNI plugins to manage networking. More about K8S +CNI (and other modes) [here][2]. + +## Docker network example + +We can easily simulate docker-alike networking using namespaces, virtual +ethernet devices and a bridge. + +Network diagram: + +```text ++----------------------------------------------------+ +| Linux host | +| +----------------------+ +----------------------+ | +| | netns node1 | | netns node2 | | +| | + | | + | | +| | | vethA | | | vethX | | +| | | 10.10.0.2/24 | | | 10.10.0.3/24 | | +| | | | | | | | +| +----------------------+ +----------------------+ | +| | | | +| | vethB | vethY | +| | | | +| br0 +-------------------------------+ | +| +-------------------------------+ | +| | ++----------------------------------------------------+ +``` + +You can run this script to create this network configuration: + +```shell +# Create bridge on host to interconnect virtual ethernets +ip link add br0 type bridge +ip link set br0 up + +# Creating virtual ethernet pairs vethA-vethB and vethX-vethY +ip link add vethA type veth peer name vethB +ip link add vethX type veth peer name vethY +ip link set vethB up +ip link set vethY up + +# Adding network namespaces node1 and node2 +# They will work as containers with independent networking +ip netns add node1 +ip netns add node2 + +# Put one end of each pair to each of netns'es +ip link set vethA netns node1 +ip link set vethX netns node2 + +# Bring interfaces inside netns up +# This should be done AFTER putting interfaces to netns because this movet turns interfaces off +ip netns exec node1 ip link set vethA up +ip netns exec node2 ip link set vethX up + +# Assign IP addresses from same 10.10.0.0/24 subnet +ip netns exec node1 ip address add 10.10.0.2/24 dev vethA +ip netns exec node2 ip address add 10.10.0.3/24 dev vethX + +# Link on-host ends of veths to bridge, i.e. providing L2 connectivity between veth'es +ip link set vethB master br0 +ip link set vethY master br0 +``` + +Check connectivity between netns'es `node1` and `node2`: + +```text +# ip netns exec node2 ping -c 3 10.10.0.2 +PING 10.10.0.2 (10.10.0.2) 56(84) bytes of data. +64 bytes from 10.10.0.2: icmp_seq=1 ttl=64 time=0.124 ms +64 bytes from 10.10.0.2: icmp_seq=2 ttl=64 time=0.151 ms +64 bytes from 10.10.0.2: icmp_seq=3 ttl=64 time=0.066 ms + +--- 10.10.0.2 ping statistics --- +3 packets transmitted, 3 received, 0% packet loss, time 2054ms +rtt min/avg/max/mdev = 0.066/0.113/0.151/0.037 ms + +# ip netns exec node1 ping -c 3 10.10.0.3 +PING 10.10.0.3 (10.10.0.3) 56(84) bytes of data. +64 bytes from 10.10.0.3: icmp_seq=1 ttl=64 time=0.079 ms +64 bytes from 10.10.0.3: icmp_seq=2 ttl=64 time=0.062 ms +64 bytes from 10.10.0.3: icmp_seq=3 ttl=64 time=0.103 ms + +--- 10.10.0.3 ping statistics --- +3 packets transmitted, 3 received, 0% packet loss, time 2030ms +rtt min/avg/max/mdev = 0.062/0.081/0.103/0.018 ms +``` + +To destroy this setup simply run: + +```shell +# Remove bridge +ip link set br0 down +ip link delete br0 + +# Remove namespaces +# We don't need to remove each of veth because `netns delete` destroys all interfaces +# inside itself, and veth can be destroyed by simply removing any of it's interfaces +ip netns delete node1 +ip netns delete node2 +``` + +## CNI basics + +CNI (Container Network Interface) is a project which aims at providing universal +clean and easy way to connect containers to network fabric. *Container* here is +basically interchangeable with *Linux network namespace*. Simply speaking CNI +plugin is a wrapper command which configures network interfaces inside container +and attaches it to some backend network. + +API is pretty simple, it consists of 3 operations: + +* Add container to network +* Remove container to network +* Report self version + +All arguments passed through environment variables. Plugin must return +JSON-serialized result which describes status of operation (like allocated IP, +created routes, etc). Also there is special type of plugin called IPAM +(IP address management) plugin, which function is to allocate IP addresses and +pass it to network cofiguration plugin. + +CNI specification details described [here][3]. + +K8S expects CNI plugin binaries to be stored in `/opt/cni/bin` and configuration +files in `/etc/cni/net.d`. All plugin produced results are stored in +`/var/lib/cni/`. + +## Lab installation + +First, follow [installation instructions](README.md#installation) + +## Sample pod configuration explored + +Run `minikube start --network-driver=cni` to spin up K8S inside VirtualBox VM +with [bridge plugin][4] and [host-local ipam plugin][5] configured. + +Let's deploy `nginx` application to cluster: + +```text +$ kubectl run nginx --image=nginx +deployment "nginx" created +``` + +Make sure the pod is running: + +```text +$ kubectl get pods +NAME READY STATUS RESTARTS AGE +nginx-701339712-05fm4 1/1 Running 0 48s +``` + +Attach console to explore inside network: + +```text +$ kubectl exec -t -i nginx-701339712-05fm4 -- /bin/bash +root@nginx-701339712-05fm4:/# ip a +... +4: eth0@if10: mtu 1460 qdisc noqueue state UP group default + link/ether 0a:58:0a:01:00:04 brd ff:ff:ff:ff:ff:ff + inet 10.1.0.4/16 scope global eth0 + valid_lft forever preferred_lft forever + inet6 fe80::cf5:fff:fe7e:e2e2/64 scope link tentative dadfailed + valid_lft forever preferred_lft forever +root@nginx-701339712-05fm4:/# ip r +default via 10.1.0.1 dev eth0 +10.1.0.0/16 dev eth0 proto kernel scope link src 10.1.0.4 +``` + +As we see container has quite simple network config. We're interested only in +interface #4, which is one end of `veth` virtual interface. Suffix `@if10` +indicates that other end of this pipe is interface #10 (which is located +outside netns -- on host). IP configuration is pretty straightforward: pod has +one IP address allocated on `10.1.0.0/16` subnet using host-local plugin. It +gets IP addresses from configured subnet. + +We can review CNI configuration using following command (on host): + +``` +$ minikube ssh +# Name of config file may differ from version to version of minikube +$ cat /etc/cni/net.d/k8s.conf +``` + +Configuration explained briefly: + +```text +{ + "name": "rkt.kubernetes.io", + "type": "bridge", # network configuration plugin + "bridge": "mybridge", # name of host bridge to use + ... + "isGateway": true, # Bridge is used as gateway, i.e. first subnet address + # is assigned to it + "ipMasq": true, # Enable IP Masquerading to pods outgoing traffic + "ipam": { # IPAM configuration following + "type": "host-local", # IPAM plugin name + "subnet": "10.1.0.0/16", # Allocation subnet + "gateway": "10.1.0.1", # Inform host-local that router is here to + # render consistent IP level configuration + ... + } +} +``` + +Inside VM (if you've exited, enter again using `minikube ssh` command) we +can explore K8S host networking part. + +```text +$ ip a +... +7: mybridge: mtu 1460 qdisc noqueue state UP group default qlen 1000 + link/ether 0a:58:0a:01:00:01 brd ff:ff:ff:ff:ff:ff + inet 10.1.0.1/16 scope global mybridge + valid_lft forever preferred_lft forever + inet6 fe80::d8da:1fff:fe99:e8c1/64 scope link + valid_lft forever preferred_lft forever +8: vethd78e9f59@if4: mtu 1460 qdisc noqueue master mybridge state UP group default + link/ether ce:30:18:e6:b7:a2 brd ff:ff:ff:ff:ff:ff link-netnsid 0 + inet6 fe80::cc30:18ff:fee6:b7a2/64 scope link + valid_lft forever preferred_lft forever +9: vethe0ce3592@if4: mtu 1460 qdisc noqueue master mybridge state UP group default + link/ether 82:cf:10:3f:a5:38 brd ff:ff:ff:ff:ff:ff link-netnsid 1 + inet6 fe80::80cf:10ff:fe3f:a538/64 scope link + valid_lft forever preferred_lft forever +10: veth10fac511@if4: mtu 1460 qdisc noqueue master mybridge state UP group default + link/ether 02:29:a5:05:e4:7d brd ff:ff:ff:ff:ff:ff link-netnsid 2 + inet6 fe80::29:a5ff:fe05:e47d/64 scope link + valid_lft forever preferred_lft forever +... +``` + +Host of course has `lo` and `ethX` interfaces in configuration required by +`docker-machine`. But our interest is here, as we see there is a bridge called +`mybridge` with gateway address assigned. And there are outer parts of `veths` +for containers. + +*All `vethXXXX` interfaces have `@if4` suffix which indicates that they are +connected to "interface #4", but in every pod `eth` is interface #4, so it's +just meaningless suffix.* + +Now see how interfaces are connected to bridge: + +```text +$ brctl show +bridge name bridge id STP enabled interfaces +docker0 8000.0242ec55eeb9 no +mybridge 8000.0a580a010001 no veth10fac511 + vethd78e9f59 + vethe0ce3592 +``` + +There's unused `docker0` bridge, and `mybridge` which is master of all outer +parts of `veths`. + +Let's examine CNI plugins stored data: + +```text +$ sudo -i +# ls /var/lib/cni/networks/rkt.kubernetes.io/ +10.1.0.2 10.1.0.3 10.1.0.4 last_reserved_ip +# cat /var/lib/cni/networks/rkt.kubernetes.io/last_reserved_ip ; echo +10.1.0.4 +# cat /var/lib/cni/networks/rkt.kubernetes.io/10.1.0.2 ; echo +0fa93b0f771db74d3f1da588a3f9b413e47b90d30bdc4b0a93b7cc841a37a156 +``` + +Directory `/var/lib/cni/networks` contains information of IPAM plugins. We can +see that host-local plugin tracks IP address which was allocated last +`last_reserved_ip` to allocate IPs in order. And every file named after +allocated IP contains ID of docker container network namespace. + +[1]: https://kubernetes.io/docs/concepts/cluster-administration/networking/ +[2]: https://kubernetes.io/docs/concepts/cluster-administration/network-plugins/ +[3]: https://github.com/containernetworking/cni/blob/master/SPEC.md +[4]: https://github.com/containernetworking/cni/blob/master/Documentation/bridge.md +[5]: https://github.com/containernetworking/cni/blob/master/Documentation/host-local.md diff --git a/networking-workshop/k8sprod.md b/networking-workshop/k8sprod.md new file mode 100644 index 00000000..f1926de4 --- /dev/null +++ b/networking-workshop/k8sprod.md @@ -0,0 +1,1014 @@ +# Kubernetes Production Patterns + +... and anti-patterns. + +We are going to explore helpful techniques to improve resiliency and high availability of Kubernetes deployments and will take a look at some common mistakes to avoid when working with Docker and Kubernetes. + +## Installation + +First, follow [installation instructions](README.md#installation) + +### Anti-Pattern: Mixing Build And Runtime + +The first common anti-pattern when working with Docker images, or more specifically, when writing Dockerfiles for your own images, is mixing build and runtime environments in the same image. + +Let's consider this Dockerfile: + +```Dockerfile +FROM ubuntu:18.04 + +RUN apt-get update +RUN apt-get install gcc +RUN gcc hello.c -o /hello +``` + +It compiles and runs a simple "hello world" program: + +```bash +$ cd prod/build +$ docker build -t prod . +$ docker run prod +Hello World +``` + +There are a couple of problems with the resulting Docker image. + +**Size** + +```bash +$ docker images | grep prod +prod latest b2c197180350 14 minutes ago 201MB +``` + +That's almost 200 megabytes to host several kilobytes of a C program! We are bringing in package manager, C compiler and lots of other unnecessary tools that are not required to run this program. + +Which leads us to the second problem: + +**Security** + +We distribute the whole build toolchain. In addition to that, we ship the source code of the image: + +```bash +$ docker run --entrypoint=cat prod /build/hello.c +#include + +int main() +{ + printf("Hello World\n"); + return 0; +} +``` + +**Splitting Build And Runtime Environments** + +A better way to do this is to use a pattern called "buildbox". The idea behind it is that you build a separate "buildbox" image that provides the necessary build environment to compile/build the program and use another, much smaller, image to run the program. + +Let's take a look: + +```bash +$ cd prod/build-fix +$ docker build -f build.dockerfile -t buildbox . +``` + +**NOTE:** We have used `-f` flag to specify the Dockerfile we are going to use. By default Docker would look for a file named `Dockerfile` which we also have in this directory. + +Now we have a `buildbox` image that contains our build environment. We can use it to compile the C program now: + +```bash +$ docker run -v $(pwd):/build buildbox gcc /build/hello.c -o /build/hello +``` + +**NOTE:** If you have your local Docker environment configured to point to your local minikube cluster (via `eval $(minikube docker-env)` command), the command above will not work because it won't be able to mount the volume. Use your local Docker installation, you can open a new shell session for that. + +Let's explore what's just happened. Instead of building another image with the compiled binary (and the program's source code) inside it using `docker build` we mounted the source code directory in our buildbox container, compiled the program and had the container to output the resulting binary to the same volume. If we look at our local directory now, we'll see that the compiled binary is there: + +```bash +$ ls -lh +``` + +Now we can build a much smaller image to run our program: + +```Dockerfile +FROM quay.io/gravitational/debian-tall:0.0.1 + +ADD hello /hello +ENTRYPOINT ["/hello"] +``` + +```bash +$ docker build -f run.dockerfile -t prod:v2 . +$ docker run prod:v2 +Hello World +$ docker images | grep prod +prod v2 ef93cea87a7c 17 seconds ago 11.05 MB +prod latest b2c197180350 45 minutes ago 201 MB +``` + +**NOTE:** Please be aware that you should either plan on providing the needed "shared libraries" in the runtime image or "statically build" your binaries to have them include all needed libraries. + +Docker supports the buildbox pattern natively starting from version `17.05`, by providing a feature called [multi-stage builds](https://docs.docker.com/develop/develop-images/multistage-build/). With multi-stage builds you can define multiple "stages" in a single Dockerfile, each of which starts with a new `FROM` clause, and selectively copy artifacts between the stages. This way you only write a single Dockerfile and end up with a single resulting (small) image. + +For example: + +```Dockerfile +# +# Build stage. +# +FROM ubuntu:18.04 + +RUN apt-get update +RUN apt-get install -y gcc +ADD hello.c /build/hello.c +RUN gcc /build/hello.c -o /build/hello + +# +# Run stage. +# +FROM quay.io/gravitational/debian-tall:0.0.1 + +COPY --from=0 /build/hello /hello +ENTRYPOINT ["/hello"] +``` + +Notice how we copy the resulting binary from the first stage of the build. Let's build v3 of our image: + +```bash +$ docker build -f multi.dockerfile -t prod:v3 . +$ docker run prod:v3 +``` + +If you query `docker images` now, you'll see that `v3` version of our image is same size as `v2`. + +### Anti Pattern: Zombies And Orphans + +**NOTE:** This example demonstration will only work on Linux. + +It is quite easy to leave orphaned processes running in the background. + +Let's launch a simple container: + +```bash +$ docker run busybox sleep 10000 +``` + +Now, let's open a separate terminal and locate the process: + +```bash +$ ps uax | grep sleep +sasha 14171 0.0 0.0 139736 17744 pts/18 Sl+ 13:25 0:00 docker run busybox sleep 10000 +root 14221 0.1 0.0 1188 4 ? Ss 13:25 0:00 sleep 10000 +``` + +As you see there are in fact two processes: `docker run` and `sleep 1000` running in a container. + +Let's send kill signal to the `docker run` (just as CI/CD job would do for long running processes): + +```bash +$ kill 14171 +``` + +However, `docker run` process has not exited, and `sleep` process is running! + +```bash +$ ps uax | grep sleep +sasha 14171 0.0 0.0 139736 17744 pts/18 Sl+ 13:25 0:00 docker run busybox sleep 10000 +root 14221 0.1 0.0 1188 4 ? Ss 13:25 0:00 sleep 10000 +``` + +Yelp engineers have a good answer for why this happens [here](https://github.com/Yelp/dumb-init): + +> The Linux kernel applies special signal handling to processes which run as PID 1. +> When processes are sent a signal on a normal Linux system, the kernel will first check for any custom handlers the process has registered for that signal, and otherwise fall back to default behavior (for example, killing the process on SIGTERM). + +> However, if the process receiving the signal is PID 1, it gets special treatment by the kernel; if it hasn't registered a handler for the signal, the kernel won't fall back to default behavior, and nothing happens. In other words, if your process doesn't explicitly handle these signals, sending it SIGTERM will have no effect at all. + +Let's enter our container and see for ourselves: + +```bash +$ docker ps +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +06703112d6ac busybox "sleep 10000" 5 minutes ago Up 5 minutes nervous_jennings +$ docker exec -ti 06703112d6ac /bin/sh +$ ps -ef +PID USER TIME COMMAND + 1 root 0:00 sleep 10000 + 12 root 0:00 /bin/sh + 18 root 0:00 ps -ef +``` + +Indeed, the `sleep` command is running as PID 1, and since it does not explicitly register any signal handlers, our TERM signal gets ignores. Let's kill the container: + +```bash +$ docker kill 06703112d6ac +``` + +To solve this (and other) issues, you need a simple init system that has proper signal handlers specified. Luckily, Yelp engineers built a simple and lightweight init system, `dumb-init`: + +```bash +$ docker run quay.io/gravitational/debian-tall /usr/bin/dumb-init /bin/sh -c "sleep 10000" +``` + +If you send SIGTERM signal to the `docker run` process now, it will handle shutdown properly. + +### Anti-Pattern: Direct Use Of Pods + +[Kubernetes Pod](https://kubernetes.io/docs/user-guide/pods/#what-is-a-pod) is a building block that by itself does not provide any durability guarantees. As Kubernetes docs say, a pod won't survive scheduling failures, node failures, or other evictions, for example due to lack of resources. + +For example, let's create a single nginx pod: + +```bash +$ cd prod/pod +$ kubectl create -f pod.yaml +$ kubectl get pods +NAME READY STATUS RESTARTS AGE +nginx 1/1 Running 0 18s +``` + +This pod will keep running, for now. It will also restart in case its container crashes, provided it has an appropriate restart policy. However, in the event a node goes down or starts running out of resources triggering evictions, the pod will be lost. Let's delete it now: + +```bash +$ kubectl delete pod/nginx +$ kubectl get pods +No resources found. +``` + +The pod is gone. + +Do not use pods directly in production. Instead, you should almost always use controllers that provide self-healing on the cluster scope - there are plenty to choose from: `Deployments`, `ReplicaSets`, `DaemonSets`, `StatefulSets` and so on. + +Even for singletons, use `Deployment` with replication factor 1, which will guarantee that pods will get rescheduled and survive eviction or node loss: + +```bash +$ kubectl create -f deploy.yaml +$ kubectl get pods +NAME READY STATUS RESTARTS AGE +nginx-65f88748fd-w2klm 1/1 Running 0 19s +``` + +If we delete the pod now, it will get rescheduled right back on: + +```bash +$ kubectl delete pod/nginx-65f88748fd-w2klm +pod "nginx-65f88748fd-w2klm" deleted +$ kubectl get pods +NAME READY STATUS RESTARTS AGE +nginx-65f88748fd-fd2sk 1/1 Running 0 4s +``` + +### Anti-Pattern: Using Background Processes + +**NOTE:** You need to have executed `eval $(minikube docker-env)` command for the following to work properly. + +```bash +$ cd prod/background +$ export registry=$(kubectl get svc/registry -ojsonpath="{.spec.clusterIP}") +$ docker build -t $registry:5000/background:0.0.1 . +$ docker push $registry:5000/background:0.0.1 +$ kubectl create -f crash.yaml +$ kubectl get pods +NAME READY STATUS RESTARTS AGE +crash 1/1 Running 0 5s +``` + +Our container was supposed to start a simple Python web server on port 5000. The container appears to be running, but let's check if the server is running there: + +```bash +$ kubectl exec -ti crash /bin/bash +root@crash:/# ps uax +USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND +root 1 0.0 0.0 21748 1596 ? Ss 00:17 0:00 /bin/bash /start.sh +root 6 0.0 0.0 5916 612 ? S 00:17 0:00 sleep 100000 +root 7 0.0 0.0 21924 2044 ? Ss 00:18 0:00 /bin/bash +root 11 0.0 0.0 19180 1296 ? R+ 00:18 0:00 ps uax +``` + +The server is not running because we made a mistake in our script, however the container itself is happily running. + +**Using Probes** + +The first obvious fix is to use a proper init system and monitor the status of the web service. +However, let's use this as an opportunity to use liveness probes: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: fix + namespace: default +spec: + containers: + - command: ['/start.sh'] + image: localhost:5000/background:0.0.1 + name: server + imagePullPolicy: Always + livenessProbe: + httpGet: + path: / + port: 5000 + timeoutSeconds: 1 +``` + +```bash +$ kubectl create -f fix.yaml +``` + +Our Python HTTP server still crashes, however this time the liveness probe will fail and the container will get restarted. + +```bash +$ kubectl get pods +NAME READY STATUS RESTARTS AGE +crash 1/1 Running 0 11m +fix 1/1 Running 1 1m +``` + +An even better solution would be avoid using background processes inside containers. Instead, decouple services from each other by running them in separate containers (process per container) and if they need to run as a single "entity", colocate them in a single pod. + +This approach has many benefits, including easier resources monitoring, ease of use and efficiency resulting in more light-weight and reusable infrastructure. + +### Production Pattern: Logging + +When configuring logging for your application running inside a container, make sure the logs go to standard output: + +```bash +$ kubectl create -f logs/logs.yaml +$ kubectl logs logs +hello, world! +``` + +Kubernetes and Docker have a system of plugins to make sure logs sent to stdout and stderr will get collected, forwarded and rotated. + +**NOTE:** This is one of the patterns of [The Twelve Factor App](https://12factor.net/logs) and Kubernetes supports it out of the box! + +### Production Pattern: Immutable Containers + +Every time you write something to a container's filesystem, it activates the [copy-on-write strategy](https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/#container-and-layers). This approach is what makes containers efficient. + +The way it works is, all layers in a Docker image are read-only. When a container starts, a thin writable layer is added on top of its other read-only layers. Any changes the container makes to the filesystem are stored there and files that do not change never get copied to that writable layer, which makes it as small as possible. + +When an existing file in a container is modified, the storage driver (`devicemapper`, `overlay` or others) performs a copy-on-write operation and copies that file to the writable layer. In case of active usage, it can put a lot of stress on a storage driver, especially in case of Devicemapper or BTRFS. + +For write-heavy applications it is recommended to not store data in the container but rather make sure that containers write data only to volumes which are independent of a running container and designed for I/O efficiency. + +For non-persistent data, Kubernetes provides a special volume type called `emptyDir`: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: test-pd +spec: + containers: + - image: busybox + name: test-container + volumeMounts: + - mountPath: /tmp + name: tempdir + volumes: + - name: tempdir + emptyDir: {} +``` + +By default the volume is backed by whatever disk is backing the node, however note that it is cleared permanently if the pod leaves the node for whatever reason (it persists across container restarts within a pod though). + +For small files it may be beneficial to set `emptyDir.medium` field to `Memory` which will make Kubernetes use a RAM-backed filesystem, `tmpfs` instead. + +### Anti-Pattern: Using `latest` Tag + +It is not recommended to use use `latest` tag in production as it creates ambiguity. For example, looking at tha "latest" tag, it is not possible to tell which version of the application is actually running. + +It is ok to use `latest` for development purposes, although make sure you set `imagePullPolicy` to `Always`, to make sure Kubernetes always pulls the latest version when creating a pod: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: always + namespace: default +spec: + containers: + - command: ['/bin/sh', '-c', "echo hello, world!"] + image: busybox:latest + name: server + imagePullPolicy: Always +``` + +### Production Pattern: Pod Readiness + +Imagine a situation when your container takes some time to start. + +To simulate this, we are going to write a simple script: + +```bash +#!/bin/bash + +echo "Starting up" +sleep 30 +echo "Started up successfully" +python -m http.server 5000 +``` + +**NOTE:** You need to have executed `eval $(minikube docker-env)` command for the following to work properly. + +Push the image and start service and deployment: + +```bash +$ cd prod/delay +$ export registry=$(kubectl get svc/registry -ojsonpath="{.spec.clusterIP}") +$ docker build -t $registry:5000/delay:0.0.1 . +$ docker push $registry:5000/delay:0.0.1 +$ kubectl create -f service.yaml +$ kubectl create -f deployment.yaml +``` + +Enter curl container inside the cluster and make sure it all works: + +```bash +$ kubectl run -ti --rm cli --image=appropriate/curl --restart=Never --command /bin/sh +$ curl http://delay:5000 + +... +``` + +You will notice that there's a `connection refused error`, when you try to access it +for the first 30 seconds. + +Update deployment to simulate deploy: + +```bash +$ docker build -t $registry:5000/delay:0.0.2 . +$ docker push $registry:5000/delay:0.0.2 +$ kubectl replace -f deployment-update.yaml +``` + +In the next window, let's try to see if we got any service downtime: + +```bash +$ curl http://delay:5000 +curl: (7) Failed to connect to delay port 5000: Connection refused +``` + +We've got a production outage despite setting `maxUnavailable: 0` in our rolling update strategy! + +This happened because Kubernetes did not know about startup delay and readiness of the service. If we look at the list of pods, we'll see that the old pod was deleted immediately after the new one has been created thus leaving us w/o a functioning service for the next 30 seconds: + +```bash +$ kubectl get pods +``` + +Let's fix that by using readiness probe: + +```yaml +readinessProbe: + httpGet: + path: / + port: 5000 + timeoutSeconds: 1 + periodSeconds: 5 +``` + +Readiness probe indicates the readiness of the pod containers and Kubernetes will take this into account when doing a deployment: + +```bash +$ kubectl replace -f deployment-fix.yaml +``` + +This time, if we observe output from `kubectl get pods`, we'll see that there will be two pods running and the old pod will start terminating only when the second one becomes ready: + +```bash +$ kubectl get pods +NAME READY STATUS RESTARTS AGE +delay-5fb9c6fb8b-prw86 1/1 Running 0 2m15s +delay-f7f84dff9-m5hw7 0/1 Running 0 3s +``` + +And the `curl` command consistently works while the service is being redeployed. + +### Anti-Pattern: Unbound Quickly Failing Jobs + +Kubernetes provides a useful tool to schedule containers to perform one-time task: [jobs](https://kubernetes.io/docs/concepts/jobs/run-to-completion-finite-workloads/). + +However, there is a problem: + +```yaml +apiVersion: batch/v1 +kind: Job +metadata: + name: bad +spec: + template: + metadata: + name: bad + spec: + restartPolicy: Never + containers: + - name: box + image: busybox + command: ["/bin/sh", "-c", "exit 1"] +``` + +```bash +$ cd prod/jobs +$ kubectl create -f bad.yaml +``` + +You are going to observe the race to create hundreds of containers for the job retrying forever: + +```bash +$ kubectl describe jobs +Name: bad +Namespace: default +Image(s): busybox +Selector: controller-uid=18a6678e-11d1-11e7-8169-525400c83acf +Parallelism: 1 +Completions: 1 +Start Time: Sat, 25 Mar 2017 20:05:41 -0700 +Labels: controller-uid=18a6678e-11d1-11e7-8169-525400c83acf + job-name=bad +Pods Statuses: 1 Running / 0 Succeeded / 24 Failed +No volumes. +Events: + FirstSeen LastSeen Count From SubObjectPath Type Reason Message + --------- -------- ----- ---- ------------- -------- ------ ------- + 1m 1m 1 {job-controller } Normal SuccessfulCreate Created pod: bad-fws8g + 1m 1m 1 {job-controller } Normal SuccessfulCreate Created pod: bad-321pk + 1m 1m 1 {job-controller } Normal SuccessfulCreate Created pod: bad-2pxq1 + 1m 1m 1 {job-controller } Normal SuccessfulCreate Created pod: bad-kl2tj + 1m 1m 1 {job-controller } Normal SuccessfulCreate Created pod: bad-wfw8q + 1m 1m 1 {job-controller } Normal SuccessfulCreate Created pod: bad-lz0hq + 1m 1m 1 {job-controller } Normal SuccessfulCreate Created pod: bad-0dck0 + 1m 1m 1 {job-controller } Normal SuccessfulCreate Created pod: bad-0lm8k + 1m 1m 1 {job-controller } Normal SuccessfulCreate Created pod: bad-q6ctf + 1m 1s 16 {job-controller } Normal SuccessfulCreate (events with common reason combined) +``` + +Probably not the result you expected. Over time, the jobs will accumulate and the load on the nodes and Docker will be quite substantial, especially if the job is failing very quickly. + +Let's clean up the busy failing job first: + +```bash +$ kubectl delete jobs/bad +``` + +Now let's use `activeDeadlineSeconds` to limit amount of retries: + +```yaml +apiVersion: batch/v1 +kind: Job +metadata: + name: bound +spec: + activeDeadlineSeconds: 30 + template: + metadata: + name: bound + spec: + restartPolicy: Never + containers: + - name: box + image: busybox + command: ["/bin/sh", "-c", "exit 1"] +``` + +```bash +$ kubectl create -f bound.yaml +``` + +Now you will see that after 30 seconds, the job has failed and no more pods will be created: + +```bash + 11s 11s 1 {job-controller } Normal DeadlineExceeded Job was active longer than specified deadline +``` + +**NOTE:** Sometimes it makes sense to retry forever. In this case make sure to set a proper pod restart policy to protect from accidental DDOS on your cluster. + +### Production Pattern: Pod Quotas + +One of important Kubernetes features is resource management. Kubernetes allows you to configure CPU/RAM resource quotas for containers to ensure that no single container can starve the entire system. + +Suppose we have a container that tends to hog memory: + +```bash +$ cd prod/quotas +$ docker build -t $registry:5000/memhog:0.0.1 . +$ docker push $registry:5000/memhog:0.0.1 +$ kubectl create -f quota.yaml +``` + +The container consumes about 100 megabytes of memory but the limit we set on our pod allows only 20. Let's see how Kubernetes handled it: + +```bash +$ kubectl get pods/quota +quota 0/1 OOMKilled 1 4s +``` + +Kubernetes's OOM killer killed the container, so if the application running inside it leaks memory gradually, it will restart. + +Kubernetes also allows you to configure quotas per namespace and uses an intelligent scheduling algorithm to ensure that pods are distributed across the cluster nodes appropriately. For example, it won't schedule a pod on a node if that pod's quota request exceeds resources available on the node. + +Proper quotas configuration is mandatory for ensuring smooth sailing in production. Check these Kubernetes resources for more information: + +https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/ +https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/ + +### Anti-Pattern: Putting Configuration Inside Image + +Often an application needs a configuration file to run. It might be tempting to just put the configuration file alongside your program inside the container: + +```bash +$ cd prod/config +$ docker build -t $registry:5000/config:0.0.1 -f config.dockerfile . +$ docker push $registry:5000/config:0.0.1 +$ kubectl create -f pod.yaml +``` + +This approach has a number of drawbacks. For example, what if we want to update the configuration? There's no easy way to do that inside the running container. Another concern is what if configuration file contains some sensitive information such as passwords or API keys? + +Kubernetes provides an elegant way to deal with these issues by using ConfigMaps. A ConfigMap is a Kubernetes resource that can be mounted inside a running container (or multiple containers). Let's create a ConfigMap out of our configuration file: + +```bash +$ kubectl create configmap config --from-file=config.yaml +$ kubectl get configmaps/config -oyaml +``` + +We can see that Kubernetes converted our configuration file into a ConfigMap. Let's now rebuild our image to remove embedded configuration file and update the pod to use ConfigMap: + +```bash +$ docker build -t $registry:5000/config:0.0.1 -f config-fix.dockerfile . +$ docker push $registry:5000/config:0.0.1 +$ kubectl delete -f pod.yaml +$ kubectl create -f pod-fix.yaml +``` + +### Production Pattern: Circuit Breaker + +In this example we will explore a more generic production pattern that's not necessarily Kubernetes-specific but we'll be using our local Kubernetes cluster to play with it. The pattern is called "circuit breaker". + +Our web application is an imaginary web server for email. To render the page, our frontend has to make two requests to the backend: + +* Talk to the weather service to get current weather. +* Fetch current mail from the database. + +We will make the following assumptions: + +* The weather service is auxiliary and its downtime shouldn't affect the whole system. +* The mail service is critical and users should still be able to view mail if weather service is down. + +Here is our frontend, weather and mail services written in Python: + +**Weather Service Backend** + +```python +from flask import Flask +app = Flask(__name__) + +@app.route("/") +def hello(): + return '''Pleasanton, CA +Saturday 8:00 PM +Partly Cloudy +12 C +Precipitation: 9% +Humidity: 74% +Wind: 14 km/h +''' + +if __name__ == "__main__": + app.run(host='0.0.0.0') +``` + +**Mail Service Backend** + +```python +from flask import Flask,jsonify +app = Flask(__name__) + +@app.route("/") +def hello(): + return jsonify([ + {"from": "", "subject": "lunch at noon tomorrow"}, + {"from": "", "subject": "compiler docs"}]) + +if __name__ == "__main__": + app.run(host='0.0.0.0') +``` + +**Frontend** + +```python +from flask import Flask +import requests +from datetime import datetime +app = Flask(__name__) + +@app.route("/") +def hello(): + weather = "weather unavailable" + try: + print "requesting weather..." + start = datetime.now() + r = requests.get('http://weather') + print "got weather in %s ..." % (datetime.now() - start) + if r.status_code == requests.codes.ok: + weather = r.text + except: + print "weather unavailable" + + print "requesting mail..." + r = requests.get('http://mail') + mail = r.json() + print "got mail in %s ..." % (datetime.now() - start) + + out = [] + for letter in mail: + out.append("
  • From: %s Subject: %s
  • " % (letter['from'], letter['subject'])) + + + return ''' + +

    Weather

    +

    %s

    +

    Email

    +

    +

      + %s +
    +

    + +''' % (weather, '
    '.join(out)) + +if __name__ == "__main__": + app.run(host='0.0.0.0') +``` + +Let's create our deployments and services: + +```bash +$ cd prod/cbreaker +$ export registry=$(kubectl get svc/registry -ojsonpath="{.spec.clusterIP}") +$ docker build -t $registry:5000/mail:0.0.1 . +$ docker push $registry:5000/mail:0.0.1 +$ kubectl apply -f service.yaml +deployment "frontend" configured +deployment "weather" configured +deployment "mail" configured +service "frontend" configured +service "mail" configured +service "weather" configured +``` + +Check that everything is running smoothly: + +```bash +$ kubectl run -ti --rm cli --image=appropriate/curl --restart=Never --command /bin/sh +$ curl http://frontend + + +

    Weather

    +

    Pleasanton, CA +Saturday 8:00 PM +Partly Cloudy +12 C +Precipitation: 9% +Humidity: 74% +Wind: 14 km/h +

    +

    Email

    +

    +

      +
    • From: Subject: lunch at noon tomorrow

    • From: Subject: compiler docs
    • +
    +

    + +``` + +Let's introduce weather service that crashes: + +```python +from flask import Flask +app = Flask(__name__) + +@app.route("/") +def hello(): + raise Exception("I am out of service") + +if __name__ == "__main__": + app.run(host='0.0.0.0') +``` + +Build and redeploy: + +```bash +$ docker build -t $registry:5000/weather-crash:0.0.1 -f weather-crash.dockerfile . +$ docker push $registry:5000/weather-crash:0.0.1 +$ kubectl apply -f weather-crash.yaml +deployment "weather" configured +``` + +Let's make sure that it is crashing: + +```bash +$ kubectl run -ti --rm cli --image=appropriate/curl --restart=Never --command /bin/sh +$ curl http://weather + +500 Internal Server Error +

    Internal Server Error

    +

    The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

    +``` + +However our frontend should be all good: + +```bash +$ kubectl run -ti --rm cli --image=appropriate/curl --restart=Never --command /bin/sh +$ curl http://frontend + + +

    Weather

    +

    weather unavailable

    +

    Email

    +

    +

      +
    • From: Subject: lunch at noon tomorrow

    • From: Subject: compiler docs
    • +
    +

    + +``` + +Everything is working as expected! There is one problem though, we have just observed the service is crashing quickly, let's see what happens +if our weather service is slow. This happens way more often in production, e.g. due to network or database overload. + +To simulate this failure we are going to introduce an artificial delay: + +```python +from flask import Flask +import time + +app = Flask(__name__) + +@app.route("/") +def hello(): + time.sleep(30) + raise Exception("System overloaded") + +if __name__ == "__main__": + app.run(host='0.0.0.0') +``` + +Build and redeploy: + +```bash +$ docker build -t $registry:5000/weather-crash-slow:0.0.1 -f weather-crash-slow.dockerfile . +$ docker push $registry:5000/weather-crash-slow:0.0.1 +$ kubectl apply -f weather-crash-slow.yaml +deployment "weather" configured +``` + +Just as expected, our weather service is timing out now: + +```bash +$ kubectl run -ti --rm cli --image=appropriate/curl --restart=Never --command /bin/sh +$ curl http://weather + +500 Internal Server Error +

    Internal Server Error

    +

    The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

    +``` + +The problem though, is that every request to frontend takes 10 seconds as well: + +```bash +$ curl http://frontend +``` + +This is a much more common type of outage - users leave in frustration as the service is unavailable. + +To fix this issue we are going to introduce a special proxy with [circuit breaker](http://vulcand.github.io/proxy.html#circuit-breakers). + +![standby](http://vulcand.github.io/_images/CircuitStandby.png) + +Circuit breaker is a special middleware that is designed to provide a fail-over action in case the service has degraded. It is very helpful to prevent cascading failures - where the failure of the one service leads to failure of another. Circuit breaker observes requests statistics and checks the stats against a special error condition. + +![tripped](http://vulcand.github.io/_images/CircuitTripped.png) + +Here is our simple circuit breaker written in python: + +```python +from flask import Flask +import requests +from datetime import datetime, timedelta +from threading import Lock +import logging, sys + + +app = Flask(__name__) + +circuit_tripped_until = datetime.now() +mutex = Lock() + +def trip(): + global circuit_tripped_until + mutex.acquire() + try: + circuit_tripped_until = datetime.now() + timedelta(0,30) + app.logger.info("circuit tripped until %s" %(circuit_tripped_until)) + finally: + mutex.release() + +def is_tripped(): + global circuit_tripped_until + mutex.acquire() + try: + return datetime.now() < circuit_tripped_until + finally: + mutex.release() + + +@app.route("/") +def hello(): + weather = "weather unavailable" + try: + if is_tripped(): + return "circuit breaker: service unavailable (tripped)" + + r = requests.get('http://localhost:5000', timeout=1) + app.logger.info("requesting weather...") + start = datetime.now() + app.logger.info("got weather in %s ..." % (datetime.now() - start)) + if r.status_code == requests.codes.ok: + return r.text + else: + trip() + return "circuit breaker: service unavailable (tripping 1)" + except: + app.logger.info("exception: %s", sys.exc_info()[0]) + trip() + return "circuit breaker: service unavailable (tripping 2)" + +if __name__ == "__main__": + app.logger.addHandler(logging.StreamHandler(sys.stdout)) + app.logger.setLevel(logging.DEBUG) + app.run(host='0.0.0.0', port=6000) +``` + +Let's build and redeploy our circuit breaker: + +```bash +$ docker build -t $registry:5000/cbreaker:0.0.1 -f cbreaker.dockerfile . +$ docker push $registry:5000/cbreaker:0.0.1 +$ kubectl apply -f weather-cbreaker.yaml +deployment "weather" configured +$ kubectl apply -f weather-service.yaml +service "weather" configured +``` + +Circuit breaker runs as a separate container next to the weather service container in the same pod: + +```bash +$ cat weather-cbreaker.yaml +``` + +Note that we have reconfigured our service so requests are handled by the circuit breaker first which forwards requests to the weather service running in the same pod, and trips if the request fails. + +The circuit breaker will detect the service outage and the auxilliary weather service will not bring our mail service down anymore: + +```bash +$ kubectl run -ti --rm cli --image=appropriate/curl --restart=Never --command /bin/sh +$ curl http://frontend + + +

    Weather

    +

    circuit breaker: service unavailable (tripped)

    +

    Email

    +

    +

      +
    • From: Subject: lunch at noon tomorrow

    • From: Subject: compiler docs
    • +
    +

    + +``` + +**NOTE:** There are some production level proxies that natively support circuit breaker pattern - such as [Vulcand](http://vulcand.github.io/), [Nginx Plus](https://www.nginx.com/products/) or [Envoy](https://lyft.github.io/envoy/) + +### Production Pattern: Sidecar For Rate And Connection Limiting + +In the previous example we used a pattern called a "sidecar container". A sidecar is a container colocated with other containers in the same pod, which adds additional logic to the service, such as error detection, TLS termination and other features. + +Here is an example of sidecar nginx proxy that adds rate and connection limits: + +```bash +$ cd prod/sidecar +$ docker build -t $registry:5000/sidecar:0.0.1 -f sidecar.dockerfile . +$ docker push $registry:5000/sidecar:0.0.1 +$ docker build -t $registry:5000/service:0.0.1 -f service.dockerfile . +$ docker push $registry:5000/service:0.0.1 +$ kubectl apply -f sidecar.yaml +deployment "sidecar" configured +``` + +Try to hit the service faster than one request per second and you will see the rate limiting in action: + +```bash +$ kubectl run -ti --rm cli --image=appropriate/curl --restart=Never --command /bin/sh +$ curl http://sidecar +``` + +For instance, [Istio](https://istio.io/docs/concepts/policies-and-telemetry/) is an example of platform that embodies this design. diff --git a/networking-workshop/k8ssecurity.md b/networking-workshop/k8ssecurity.md new file mode 100644 index 00000000..cfeea8b5 --- /dev/null +++ b/networking-workshop/k8ssecurity.md @@ -0,0 +1,521 @@ +# Kubernetes based Application Security Patterns + +... and anti-patterns. + +This workshop task will explore security concepts and the kubernetes primitives for aiding in secure application development. + +## Kubernetes security primitives +Kubernetes provides a number of security primitives, that allow for an application to indicate what access it should have to the system. + +### Authentication +Reference: https://kubernetes.io/docs/reference/access-authn-authz/authentication/ + +Two types of users: +- Normal Users +- Service Accounts + +#### Normal Accounts +Normal accounts in kubernetes are controlled by an external system. Kubernetes does not include it's own internal user management system, and is built around using external identity providers. + +#### Service Accounts +Service accounts are internal to kubernetes accounts, that are assigned to services (pods) to gain access to the kubernetes API. + +### Role-based application controls +Reference: https://kubernetes.io/docs/reference/access-authn-authz/rbac/ + +The RBAC API declares resource objects that can be used to describe authorization policies for a cluster and how to link those policies to specific users. + +#### Namespaces +Reference: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ + +Namespaces can create "virtual" kubernetes clusters within the same physical cluster. + +Create a namespace: +``` +kubectl apply -f - </ +``` + +If we look at the logs of a particular container, we’ll see that these are in fact symlinks to the actual log files Docker keeps inside its data directory: + +```bash +planet$ ls -l /var/log/pods/01146b3e-3709-11ea-b8d5-080027f6e425/init/ +total 4 +lrwxrwxrwx 1 root root 161 Jan 14 20:04 0.log -> /ext/docker/containers/236f...-json.log +``` + +In addition to `/var/log/pods`, Kubernetes also sets up a `/var/log/containers` directory which has a flat structure and the logs of all containers running on the node. The log files are also symlinks that point to the respective files in `/var/log/pods`: + +```bash +planet$ ls -l /var/log/containers/ +total 180 +lrwxrwxrwx 1 root root 66 Jan 15 00:24 bandwagon-6c4b...-lqbll_kube-system_bandwagon-2641....log -> /var/log/pods/0b8e.../bandwagon/1.log +``` + +## Forwarder + +Log forwarder runs on every node of the cluster as a part of a DaemonSet: + +```bash +$ kubectl -nkube-system get ds,pods -lname=log-forwarder +``` + +This component uses [remote_syslog2](https://github.com/papertrail/remote_syslog2) to monitor files in the following directories: + +* `/var/log/containers/*.log` + +Like explained above, this directory contains logs for all containers running on the node. + +* `/var/lib/gravity/site/**/*.log` + +This directory contains Gravity-specific operation logs: + +```bash +$ ls -l /var/lib/gravity/site/*/*.log +``` + +The forwarder Docker image and its configuration can be found [here](https://github.com/gravitational/logging-app/tree/version/5.5.x/images/forwarder). + +## Collector + +Log collector is an rsyslogd server that’s running as a part of a Deployment: + +```bash +$ kubectl -nkube-system get deploy,pods -lrole=log-collector +``` + +The collector exposes rsyslog server via a Kubernetes Service where forwarders send entries from the files they monitor over tcp protocol: + +```bash +$ kubectl -nkube-system get services/log-collector +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +log-collector ClusterIP 10.100.81.83 514/UDP,514/TCP,8083/TCP 6h54m +``` + +The collector itself writes all logs into `/var/log/messages` which is mounted into its pod so we can inspect all the logs on the node where the collector is running, inside the planet container: + +```bash +planet$ less /var/log/messages +``` + +Keep in mind that since the collector runs on a single node at a time and writes to the local `/var/log/messages`, the logs may become scattered across multiple nodes if the pod gets rescheduled to another node. + +The collector Docker image and its configuration can be found [here](https://github.com/gravitational/logging-app/tree/version/5.5.x/images/collector). + +### wstail + +In addition to running rsyslog daemon, the log collector container also runs a program called “wstail”. It adds the following functionality: + +* Serves an HTTP API that allows to query all collected logs. The API is exposed via the same Kubernetes service and is used by Gravity Control Panel to provide search functionality. +* Is responsible for creating/deleting log forwarder configurations when they are created/deleted by users. + +## Custom Log Forwarders + +The rsyslog server that runs as a part of the log collector can be configured to ship logs to a remote destination, for example to another external rsyslog server for aggregation. + +To support this scenario, Gravity exposes a resource called `LogForwarder`. Let’s create a log forwarder that will be forwarding the logs to some server running on our node: + +```bash +$ cat < logforwarder.yaml +kind: logforwarder +version: v2 +metadata: + name: forwarder1 +spec: + address: 192.168.99.102:514 + protocol: udp +EOF +$ gravity resource create logforwarder.yaml +``` + +To see all currently configured log forwarders we can use the resource get command: + +```bash +$ gravity resource get logforwarders +Name Address Protocol +---- ------- -------- +forwarder1 192.168.99.102:514 udp +``` + +When the resource is created, Gravity does the following to set up the forwarding. + +The `kube-system/log-forwarders` config map is updated by Gravity with information about the newly created or updated forwarder: + +```bash +$ kubectl -nkube-system get configmaps/log-forwarders -oyaml +``` + +Then it restarts the log-collector pod: + +```bash +$ kubectl -nkube-system get pods -lrole=log-collector +NAME READY STATUS RESTARTS AGE +log-collector-697d94486-2fgxp 1/1 Running 0 5s +``` + +Upon initialization, wstail process initializes the rsyslog daemon configuration based on the configured log forwarders and sets it up with the appropriate forwarding rules: + +```bash +$ kubectl -nkube-system exec log-collector-697d94486-2fgxp -- ls -l /etc/rsyslog.d +total 4 +-rw-r--r-- 1 root root 23 Jan 15 19:22 forwarder1 +$ kubectl -nkube-system exec log-collector-697d94486-2fgxp -- cat /etc/rsyslog.d/forwarder1 +*.* @192.168.99.102:514 +``` + +From here on, the rsyslog server running inside the log collector pod will be forwarding all logs it receives to the configured destinations using the rsyslog protocol. + +We can test this by capturing all traffic on this port using netcat: + +```bash +$ sudo nc -4ulk 514 +``` + +To deconfigure a log forwarder, we can just delete its Gravity resource and it will take care of updating the config map and rsyslog configuration: + +```bash +$ gravity resource rm logforwarders forwarder1 +``` + +## Troubleshooting + +Gravity does not provide a built-in way to update the rsyslogd configuration other than configuring log forwarding, but it is possible to enable debug mode on it in order to be able to troubleshoot. + +The rsyslog server supports reading debugging configuration from environment variables so in order to turn it on we can update the deployment specification: + +```bash +$ EDITOR=nano kubectl -nkube-system edit deploy/log-collector +``` + +And add the following environment variables to the collector container: + +```yaml +- env: + - name: RSYSLOG_DEBUGLOG + value: /var/log/rsyslog.txt + - name: RSYSLOG_DEBUG + value: Debug NoStdOut +``` + +Once the pod has restarted, the rsyslog server’s debug logs will go to the configured file inside the container: + +```bash +$ kubectl -nkube-system exec log-collector-68cc9dccc7-nvldg tail -- -f /var/log/rsyslog.txt +``` + +See more information about various debugging options available for ryslogd in its documentation. + +The other common types of issues related to log forwarding are various networking errors so standard network troubleshooting tools like tcpdump can be utilized to find problems in that area. diff --git a/networking-workshop/logging-6.x.md b/networking-workshop/logging-6.x.md new file mode 100644 index 00000000..94bb05d0 --- /dev/null +++ b/networking-workshop/logging-6.x.md @@ -0,0 +1,177 @@ +# Gravity Logging (for Gravity 6.0 and later) + +## Prerequisites + +Docker 101, Kubernetes 101, Gravity 101. + +## Introduction + +Gravity clusters come preconfigured with the logging infrastructure that collects the logs from all running containers, forwards them to a single destination and makes them available for viewing and querying via an API. + +In Gravity 6.0 and later the logging stack is based on [Logrange](https://www.logrange.io/) - an open-source streaming database for aggregating application logs and other machine-generated data from multiple sources. + +## Pods / Containers Logs Locations + +Before diving into Gravity’s logging infrastructure, let’s explore how it is set up in Kubernetes in general. + +Kubernetes sets up symlinks in well-known locations for logs of all containers running in the cluster and groups them up in two directories on each node, by pod and by container. The directories where these logs go are `/var/log/pods` and `/var/log/containers` respectively. + +Note that these directories reside inside the planet container, not on the host. + +On each node, the logs of all containers running in the same pod will be grouped in that pod directory under `/var/log/pods`: + +```bash +planet$ ls -l /var/log/pods// +``` + +If we look at the logs of a particular container, we’ll see that these are in fact symlinks to the actual log files Docker keeps inside its data directory: + +```bash +planet$ ls -l /var/log/pods/01146b3e-3709-11ea-b8d5-080027f6e425/init/ +total 4 +lrwxrwxrwx 1 root root 161 Jan 14 20:04 0.log -> /ext/docker/containers/236f...-json.log +``` + +In addition to `/var/log/pods`, Kubernetes also sets up a `/var/log/containers` directory which has a flat structure and the logs of all containers running on the node. The log files are also symlinks that point to the respective files in `/var/log/pods`: + +```bash +planet$ ls -l /var/log/containers/ +total 180 +lrwxrwxrwx 1 root root 66 Jan 15 00:24 bandwagon-6c4b...-lqbll_kube-system_bandwagon-2641....log -> /var/log/pods/0b8e.../bandwagon/1.log +``` + +## Logrange Components + +Logrange consists of the following main components: collector, aggregator and forwarder. These components can be seen in the following diagram. + +![Logrange Architecture](img/logrange.png) + +### Collector + +Collector collects the logs from cluster nodes and sends them to the aggregator. It runs on each cluster node as a DaemonSet: + +```bash +$ kubectl -nkube-system get ds,po -lapp=lr-collector +``` + +If we take a peek into its configuration, we’ll see the paths it monitors for the logs: + +```bash +$ kubectl -nkube-system get cm/lr-collector -oyaml + ... + "IncludePaths": [ + "/var/log/containers/*.log", + "/var/log/containers/*/*.log", + "/var/lib/gravity/site/*/*.log" + ], +``` + +Like explained above, the first 2 directories contain the logs for all containers running on the node. The 3rd directory contains Gravity-specific operation logs. + +From the same configuration we can also see that container logs are marked with specific tags which are then used for searching: + +``` + "Meta": { + "Tags": { + "pod": "{pod}", + "ns": "{ns}", + "cname": "{cname}", + "cid": "{cid}" + } + } +``` + +### Aggregator + +Aggregator is a data storage plane that receives and aggregates log data coming from the collectors. It runs as a Deployment: + +```bash +$ kubectl -nkube-system get deploy,po -lapp=lr-aggregator +``` + +Its configuration is also stored in a ConfigMap: + +```bash +$ kubectl -nkube-system get configmaps/lr-aggregator -oyaml +``` + +From the configuration we can see that Logrange keeps its data under Gravity state directory, in `/var/lib/gravity/logrange/data/`. + +### Forwarder + +Forwarder extracts logs from the aggregator based on some criteria and sends them to a 3-rd party system. It also runs as a Deployment: + +```bash +$ kubectl -nkube-system get deploy,po -lapp=lr-forwarder +``` + +Forwarder is used for forwarding logs to other destinations when the user configured custom log forwarders which we’ll take a look at below. + +### Adapter + +Adapter is not a part of a standard Logrange deployment but rather a Gravity-specific component that provides some additional functionality. It runs as a Deployment: + +```bash +$ kubectl -nkube-system get deploy,pods -lapp=log-collector +``` + +The adapter adds the following functionality: + +* Serves an HTTP API that allows to query all collected logs. The API is exposed via a Kubernetes service and is used by Gravity Control Panel to provide search functionality. Internally, adapter uses Logrange’s [LQL](https://www.logrange.io/docs/lql.html) to query the data. +* It is also responsible for updating log forwarder configurations when they are created or modified by users. + +## Custom Log Forwarders + +Logrange forwarder can be configured to ship logs to a remote destination for aggregation, for example to an external rsyslog server, Splunk, etc. + +To support this scenario, Gravity exposes a resource called LogForwarder. Let’s create a log forwarder that will be forwarding the logs to some server running on our node: + +```bash +$ cat < logforwarder.yaml +kind: logforwarder +version: v2 +metadata: + name: forwarder1 +spec: + address: 192.168.99.102:514 + protocol: udp +EOF +$ gravity resource create logforwarder.yaml +``` + +To see all currently configured log forwarders we can use the resource get command: + +```bash +$ gravity resource get logforwarders +Name Address Protocol +---- ------- -------- +forwarder1 192.168.99.102:514 udp +``` + +When the resource is created, Gravity does the following to set up the forwarding. + +The `kube-system/log-forwarders` ConfigMap is updated by Gravity with information about the newly created or updated forwarder: + +```bash +$ kubectl -nkube-system get configmaps/log-forwarders -oyaml +``` + +Then, Logrange adapter picks up the change and updates the forwarder configuration: + +```bash +$ kubectl -nkube-system get configmaps/lr-forwarder -oyaml +``` + +After that, the forwarder starts sending the logs to the configured destination using the syslog protocol. + +We can test this by capturing all traffic on this port using netcat: + +```bash +$ sudo nc -4ulk 514 +``` + +To deconfigure a log forwarder, we can just delete its Gravity resource and it will take care of removing its configuration: + +```bash +$ gravity resource rm logforwarders forwarder1 +``` diff --git a/networking-workshop/mattermost/postgres-service.yaml b/networking-workshop/mattermost/postgres-service.yaml new file mode 100644 index 00000000..3eb090e8 --- /dev/null +++ b/networking-workshop/mattermost/postgres-service.yaml @@ -0,0 +1,14 @@ +# service that routes to database +apiVersion: v1 +kind: Service +metadata: + name: postgres + labels: + app: mattermost + role: mattermost-database +spec: + type: NodePort + ports: + - port: 5432 + selector: + role: mattermost-database \ No newline at end of file diff --git a/networking-workshop/mattermost/postgres.yaml b/networking-workshop/mattermost/postgres.yaml new file mode 100644 index 00000000..a7738fd7 --- /dev/null +++ b/networking-workshop/mattermost/postgres.yaml @@ -0,0 +1,17 @@ +# mattermost postgres worker +apiVersion: v1 +kind: Pod +metadata: + name: mattermost-database + labels: + app: mattermost + role: mattermost-database +spec: + containers: + - name: mattermost-postgres + image: postgres:12.2 + ports: + - containerPort: 5432 + env: + - name: POSTGRES_HOST_AUTH_METHOD + value: "trust" diff --git a/networking-workshop/mattermost/worker-config/config.json b/networking-workshop/mattermost/worker-config/config.json new file mode 100644 index 00000000..dea38783 --- /dev/null +++ b/networking-workshop/mattermost/worker-config/config.json @@ -0,0 +1,95 @@ +{ + "ServiceSettings": { + "ListenAddress": ":80", + "MaximumLoginAttempts": 10, + "SegmentDeveloperKey": "", + "GoogleDeveloperKey": "", + "EnableOAuthServiceProvider": false, + "EnableIncomingWebhooks": false, + "EnableOutgoingWebhooks": false, + "EnablePostUsernameOverride": false, + "EnablePostIconOverride": false, + "EnableTesting": false, + "EnableSecurityFixAlert": true + }, + "TeamSettings": { + "SiteName": "Mattermost", + "MaxUsersPerTeam": 50, + "EnableTeamCreation": true, + "EnableUserCreation": true, + "RestrictCreationToDomains": "", + "RestrictTeamNames": true, + "EnableTeamListing": false + }, + "SqlSettings": { + "DriverName": "postgres", + "DataSource": "postgres://postgres:mattermost@postgres:5432/postgres?sslmode=disable", + "DataSourceReplicas": ["postgres://postgres:mattermost@postgres:5432/postgres?sslmode=disable"], + "MaxIdleConns": 10, + "MaxOpenConns": 10, + "Trace": false, + "AtRestEncryptKey": "7rAh6iwQCkV4cA1Gsg3fgGOXJAQ43QVg" + }, + "LogSettings": { + "EnableConsole": false, + "ConsoleLevel": "INFO", + "EnableFile": true, + "FileLevel": "INFO", + "FileFormat": "", + "FileLocation": "" + }, + "FileSettings": { + "DriverName": "local", + "Directory": "/var/mattermost/data/", + "EnablePublicLink": true, + "PublicLinkSalt": "A705AklYF8MFDOfcwh3I488G8vtLlVip", + "ThumbnailWidth": 120, + "ThumbnailHeight": 100, + "PreviewWidth": 1024, + "PreviewHeight": 0, + "ProfileWidth": 128, + "ProfileHeight": 128, + "InitialFont": "luximbi.ttf", + "AmazonS3AccessKeyId": "", + "AmazonS3SecretAccessKey": "", + "AmazonS3Bucket": "", + "AmazonS3Region": "" + }, + "EmailSettings": { + "EnableSignUpWithEmail": true, + "SendEmailNotifications": false, + "RequireEmailVerification": false, + "FeedbackName": "", + "FeedbackEmail": "", + "SMTPUsername": "", + "SMTPPassword": "", + "SMTPServer": "", + "SMTPPort": "", + "ConnectionSecurity": "", + "InviteSalt": "bjlSR4QqkXFBr7TP4oDzlfZmcNuH9YoS", + "PasswordResetSalt": "vZ4DcKyVVRlKHHJpexcuXzojkE5PZ5eL", + "ApplePushServer": "", + "ApplePushCertPublic": "", + "ApplePushCertPrivate": "" + }, + "RateLimitSettings": { + "EnableRateLimiter": true, + "PerSec": 10, + "MemoryStoreSize": 10000, + "VaryByRemoteAddr": true, + "VaryByHeader": "" + }, + "PrivacySettings": { + "ShowEmailAddress": true, + "ShowFullName": true + }, + "GitLabSettings": { + "Enable": false, + "Secret": "", + "Id": "", + "Scope": "", + "AuthEndpoint": "", + "TokenEndpoint": "", + "UserApiEndpoint": "" + } +} diff --git a/networking-workshop/mattermost/worker-service.yaml b/networking-workshop/mattermost/worker-service.yaml new file mode 100644 index 00000000..d077dd96 --- /dev/null +++ b/networking-workshop/mattermost/worker-service.yaml @@ -0,0 +1,15 @@ +# service for web worker +apiVersion: v1 +kind: Service +metadata: + name: mattermost + labels: + app: mattermost + role: mattermost-worker +spec: + type: NodePort + ports: + - port: 80 + name: http + selector: + role: mattermost-worker diff --git a/networking-workshop/mattermost/worker.yaml b/networking-workshop/mattermost/worker.yaml new file mode 100644 index 00000000..a30c13c7 --- /dev/null +++ b/networking-workshop/mattermost/worker.yaml @@ -0,0 +1,32 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + app: mattermost + role: mattermost-worker + name: mattermost-worker + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + role: mattermost-worker + template: + metadata: + labels: + app: mattermost + role: mattermost-worker + spec: + containers: + - image: __REGISTRY_IP__/mattermost-worker:latest + name: mattermost-worker + ports: + - containerPort: 80 + protocol: TCP + volumeMounts: + - name: config-volume + mountPath: /var/mattermost/config + volumes: + - name: config-volume + configMap: + name: mattermost-v1 diff --git a/networking-workshop/mattermost/worker/Dockerfile b/networking-workshop/mattermost/worker/Dockerfile new file mode 100644 index 00000000..bdbec33f --- /dev/null +++ b/networking-workshop/mattermost/worker/Dockerfile @@ -0,0 +1,21 @@ +# Copyright (c) 2015 Spinpunch, Inc. All Rights Reserved. +# See License.txt for license information. +FROM ubuntu:18.04 + +# Copy over files +ADD https://releases.mattermost.com/5.21.0/mattermost-team-5.21.0-linux-amd64.tar.gz / +RUN cd /var && tar -zxvf /mattermost-team-5.21.0-linux-amd64.tar.gz && rm /mattermost-team-5.21.0-linux-amd64.tar.gz + +ADD docker-entry.sh /var/mattermost/bin/docker-entry.sh +RUN chmod +x /var/mattermost/bin/docker-entry.sh + +# Create default storage directory +RUN mkdir /var/mattermost/data + +# link log file to stdout +RUN ln -s /dev/stdout /var/mattermost/logs/mattermost.log + +ENTRYPOINT ["/var/mattermost/bin/docker-entry.sh"] + +# Expose port 80 +EXPOSE 80 diff --git a/networking-workshop/mattermost/worker/docker-entry.sh b/networking-workshop/mattermost/worker/docker-entry.sh new file mode 100644 index 00000000..4722bd63 --- /dev/null +++ b/networking-workshop/mattermost/worker/docker-entry.sh @@ -0,0 +1,4 @@ +#!/bin/bash + +cd /var/mattermost/bin +./mattermost --config=/var/mattermost/config/config.json diff --git a/networking-workshop/monitoring-5.x.md b/networking-workshop/monitoring-5.x.md new file mode 100644 index 00000000..dbb53239 --- /dev/null +++ b/networking-workshop/monitoring-5.x.md @@ -0,0 +1,922 @@ +# Gravity Monitoring & Alerts (for Gravity 5.5 and earlier) + +## Prerequisites + +Docker 101, Kubernetes 101, Gravity 101. + +## Introduction + +_Note: This part of the training pertains to Gravity 5.5 and earlier._ + +Gravity Clusters come with a fully configured and customizable monitoring and alerting systems by default. The system consists of various components, which are automatically included into a Cluster Image that is built with a single command `tele build`. + +## Overview + +Before getting into Gravity’s monitoring and alerts capability in more detail, let’s first discuss the various components that are involved. + +There are 4 main components in the monitoring system: InfluxDB, Heapster, Grafana, and Kapacitor. + +### InfluxDB + +Is an open source time series database which is used for the main data store for monitoring time series data. Provides the Kubernetes service `influxdb.monitoring.svc.cluster.local.` + +### Heapster + +Monitors Kubernetes components in generating a collection of not only performance metrics about workloads, nodes, and pods, but also events generated by Clusters. The statistics captured are reported to InfluxDB. + +### Grafana + +Is an open source metrics suite which provides the dashboard in the Gravity monitoring and alerts system. The dashboard provides a visual to the information stored in InfluxDB, which is exposed as the service `grafana.monitoring.svc.cluster.local`. Credentials generated are placed into a secret `grafana` in the monitoring namespace + +Gravity is shipped with 2 pre-configured dashboards providing a visual of machine and pod-level overview of the installed cluster. Within the Gravity control panel, you can access the dashboard by navigating to the Monitoring page. + +By default, Grafana is running in anonymous read-only mode. Anyone who logs into Gravity can view but not modify the dashboards. + +### Kapacitor + +Is the data processing engine for InfluxDB, which streams data from InfluxDB and sends alerts to the end user exposed as the service `kapacitor.monitoring.svc.cluster.local.` + +## Metrics Overview + +All monitoring components are running in the “monitoring” namespace in Gravity. Let’s take a look at them: + +``` +$ kubectl -nmonitoring get pods +NAME READY STATUS RESTARTS AGE +grafana-8cb94d5dc-6dc2h 2/2 Running 0 10m +heapster-57fbfbbc7-9xtm6 1/1 Running 0 10m +influxdb-599c5f5c45-6hqmc 2/2 Running 1 10m +kapacitor-68f6d76878-8m26x 3/3 Running 0 10m +telegraf-75487b79bd-ptvzd 1/1 Running 0 10m +telegraf-node-master-x9v48 1/1 Running 0 10m +``` + +Most of the cluster metrics are collected by Heapster. Heapster runs as a part of a Deployment and collects metrics from the cluster nodes and persists them into the configured “sinks”. + +The Heapster pod collects metrics from kubelets running on the cluster nodes, which in turn queries the data from cAdvisors - a container resource usage collector integrated into kubelet that supports Docker containers natively. cAdvisor agent running on a node discovers all running containers and collects their CPU, memory, filesystem and network usage statistics. + +Both of these collectors operate on their own intervals - kubelet queries cAdvisor every 15 seconds, while Heapster scrapes metrics from all kubelets every minute. + +Heapster by itself does not store any data - instead, it ships all scraped metrics to the configured sinks. In Gravity clusters the sink is an InfluxDB database that is deployed as a part of the monitoring application. + +All metrics collected by Heapster are placed into the `k8s` database in InfluxDB. In InfluxDB the data is organized into "measurements". A measurement acts as a container for "fields" and a few other things. Applying a very rough analogy with relational databases, a measurement can be thought of as a "table" whereas the fields are "columns" of the table. In addition, each measurement can have tags attached to it which can be used to add various metadata to the data. + +Each metric is stored as a separate “series” in InfluxDB. A series in InfluxDB is the collection of data that share a retention policy, a measurement and a tag set. Heapster tags each metrics with different labels, such as host name, pod name, container name and others, which become “tags” on the stored series. Tags are indexed so queries on tags are fast. + +When troubleshooting problems with metrics, it is sometimes useful to look into the Heapster container logs where it can be seen if it experiences communication issues with InfluxDB service or has other issues: + +``` +$ kubectl -nmonitoring logs heapster-57fbfbbc7-9xtm6 +``` + +In addition, any other apps that collect metrics should also submit them into the same DB in order for proper retention policies to be enforced. + +## Exploring InfluxDB + +Like mentioned above, InfluxDB is exposed via a cluster-local Kubernetes service `influxdb.monitoring.svc.cluster.local` and serves its HTTP API on port `8086` so we can use it to explore the database from the CLI. + +Let's enter the Gravity master container to make sure the services are resolvable and to get access to additional CLI tools: + +```bash +$ sudo gravity shell +``` + +Let's ping the database to make sure it's up and running: + +```bash +$ curl -sl -I http://influxdb.monitoring.svc.cluster.local:8086/ping +// Should return 204 response. +``` + +InfluxDB API endpoint requires authentication so to make actual queries to the database we need to determine the credentials first. The generated credentials are kept in the `influxdb` secret in the monitoring namespace: + +```bash +$ kubectl -nmonitoring get secrets/influxdb -oyaml +``` + +Note that the credentials in the secret are base64-encoded so you'd need to decode them: + +```bash +$ echo | base64 -d +$ export PASS=xxx +``` + +Once the credentials have been decoded (the username is `root` and the password is generated during installation), they can be supplied via a cURL command. For example, let's see what databases we currently have: + +```bash +$ curl -s -u root:$PASS http://influxdb.monitoring.svc.cluster.local:8086/query --data-urlencode 'q=show databases' | jq +``` + +Now we can also see which measurements are currently being collected: + +```bash +$ curl -s -u root:$PASS http://influxdb.monitoring.svc.cluster.local:8086/query?db=k8s --data-urlencode 'q=show measurements' | jq +``` + +Finally, we can query specific metrics if we want to using InfluxDB's SQL-like query language: + +```bash +$ curl -s -u root:$PASS http://influxdb.monitoring.svc.cluster.local:8086/query?db=k8s --data-urlencode 'q=select * from uptime limit 10' | jq +``` + +Refer to the InfluxDB [API documentation](https://docs.influxdata.com/influxdb/v1.7/tools/api/#query-http-endpoint) if you want to learn more about querying the database. + +## Metric Retention Policy & Rollups + +Let's now talk about durations the measurements are stored for. During initial installation Gravity pre-configures InfluxDB with the following retention policies: + +* default = 24 hours - is used for high precision metrics. +* medium = 4 weeks - is used for medium precision metrics. +* long = 52 weeks - keeps metrics aggregated over even larger intervals. + +We can use the same InfluxDB API to see the retention policies configured in the database: + +```bash +$ curl -s -u root:$PASS http://influxdb.monitoring.svc.cluster.local:8086/query?db=k8s --data-urlencode 'q=show retention policies' | jq +``` + +All metrics sent to InfluxDB by Heapster are saved using the default retention policy which means that all the high-resolution metrics collected are kept intact for 24 hours. + +To provide historical overview some of the most commonly helpful metrics (such as CPU/memory usage, network transfer rates) are rolled up to lower resolutions and stored using the longer retention policies mentioned above. + +In order to provide such downsampled metrics, Gravity uses InfluxDB “continuous queries” which are programmed to run automatically and aggregate metrics over a certain interval. + +The Gravity monitoring system allows two types of rollup configurations for collecting metrics: + +* medium = aggregates data over 5 minute intervals +* long = aggregates data over 1 hour intervals + +Each of the two rollups mentioned above, continue to their respective retention policy following. For example the long rollup aggregates data over 1 hour interval and goes into the long retention policy. + +Preconfigured rollups that Gravity clusters come with are stored in the `rollups-default` config map in the monitoring namespace: + +```bash +$ kubectl -nmonitoring get configmaps/rollups-default -oyaml +``` + +The configuration of retention policies and rollups is handled by a “watcher” service that runs in a container as a part of the InfluxDB pod so all these configurations can be seen in its logs: + +``` +$ kubectl -nmonitoring logs influxdb-599c5f5c45-6hqmc watcher +``` + +## Custom Rollups + +In addition to the rollups pre-configured by Gravity, applications can downsample their own metrics (or create different rollups for standard metrics) by configuring their own rollups through ConfigMaps. + +Custom rollup ConfigMaps should be created in the `monitoring` namespace and assigned a `monitoring` label with value of `rollup`. + +An example ConfigMap is shown below with a Custom Metric Rollups: + +``` +apiVersion: v1 +kind: ConfigMap +metadata: + name: myrollups + namespace: monitoring + labels: + monitoring: rollup +data: + rollups: | + [ + { + "retention": "medium", + "measurement": "cpu/usage_rate", + "name": "cpu/usage_rate/medium", + "functions": [ + { + "function": "max", + "field": "value", + "alias": "value_max" + }, + { + "function": "mean", + "field": "value", + "alias": "value_mean" + } + ] + } + ] +``` + +The watcher process will detect the new ConfigMap and configure an appropriate continuous query for the new rollup: + +``` +$ kubectl -nmonitoring logs influxdb-599c5f5c45-6hqmc watcher +... +time="2020-01-24T05:40:13Z" level=info msg="Detected event ADDED for configmap \"myrollups\"" label="monitoring in (rollup)" watch=configmap +time="2020-01-24T05:40:13Z" level=info msg="New rollup." query="create continuous query \"cpu/usage_rate/medium\" on k8s begin select max(\"value\") as value_max, mean(\"value\") as value_mean into k8s.\"medium\".\"cpu/usage_rate/medium\" from k8s.\"default\".\"cpu/usage_rate\" group by *, time(5m) end" +``` + +## Custom Dashboards + +Along with the dashboards mentioned above, your applications can use their own Grafana dashboards by using ConfigMaps. + +Similar to creating custom rollups, in order to use a custom dashboard, the ConfigMap should be created in the `monitoring` namespace, assigned a `monitoring` label with a value `dashboard`. + +Under the specified namespace, the ConfigMap will be recognized and loaded when installing the application. It is possible to add new ConfigMaps at a later time as the watcher will then pick it up and create it in Grafana. Similarly, if you delete the ConfigMap, the watcher will delete it from Grafana. + +Dashboard ConfigMaps may contain multiple keys with dashboards as key names are not relevant. + +An example ConfigMap is shown below: + +``` +apiVersion: v1 +kind: ConfigMap +metadata: + name: mydashboard + namespace: monitoring + labels: + monitoring: dashboard +data: + mydashboard: | + { ... dashboard JSON ... } +``` + +_Note: by default Grafana is run in read-only mode, a separate Grafana instance is required to create custom dashboards._ + +## Default Metrics + +The following are the default metrics captured by the Gravity Monitoring & Alerts system: + +### Heapster Metrics + +Below are a list of metrics captured by Heapster which are exported to the backend: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Metric Name + Description +
    cpu + limit + CPU hard limit in millicores. +
    node_capacity + CPU capacity of a node. +
    node_allocatable + CPU allocatable of a node. +
    node_reservation + Share of CPU that is reserved on the node allocatable. +
    node_utilization + CPU utilization as a share of node allocatable. +
    request + CPU request (the guaranteed amount of resources) in millicores. +
    usage + Cumulative amount of consumed CPU time on all cores in nanoseconds. +
    usage_rate + CPU usage on all cores in millicores. +
    load + CPU load in milliloads, i.e., runnable threads * 1000 +
    ephemeral_storage + limit + Local ephemeral storage hard limit in bytes. +
    request + Local ephemeral storage request (the guaranteed amount of resources) in bytes. +
    usage + Total local ephemeral storage usage. +
    node_capacity + Local ephemeral storage capacity of a node. +
    node_allocatable + Local ephemeral storage allocatable of a node. +
    node_reservation + Share of local ephemeral storage that is reserved on the node allocatable. +
    node_utilization + Local ephemeral utilization as a share of ephemeral storage allocatable. +
    filesystem + usage + Total number of bytes consumed on a filesystem. +
    limit + The total size of filesystem in bytes. +
    available + The number of available bytes remaining in a the filesystem +
    inodes + The number of available inodes in a the filesystem +
    inodes_free + The number of free inodes remaining in a the filesystem +
    disk + io_read_bytes + Number of bytes read from a disk partition +
    io_write_bytes + Number of bytes written to a disk partition +
    io_read_bytes_rate + Number of bytes read from a disk partition per second +
    io_write_bytes_rate + Number of bytes written to a disk partition per second +
    memory + limit + Memory hard limit in bytes. +
    major_page_faults + Number of major page faults. +
    major_page_faults_rate + Number of major page faults per second. +
    node_capacity + Memory capacity of a node. +
    node_allocatable + Memory allocatable of a node. +
    node_reservation + Share of memory that is reserved on the node allocatable. +
    node_utilization + Memory utilization as a share of memory allocatable. +
    page_faults + Number of page faults. +
    page_faults_rate + Number of page faults per second. +
    request + Memory request (the guaranteed amount of resources) in bytes. +
    usage + Total memory usage. +
    cache + Cache memory usage. +
    rss + RSS memory usage. +
    working_set + Total working set usage. Working set is the memory being used and not easily dropped by the kernel. +
    accelerator + memory_total + Memory capacity of an accelerator. +
    memory_used + Memory used of an accelerator. +
    duty_cycle + Duty cycle of an accelerator. +
    request + Number of accelerator devices requested by container. +
    network + rx + Cumulative number of bytes received over the network. +
    rx_errors + Cumulative number of errors while receiving over the network. +
    rx_errors_rate + Number of errors while receiving over the network per second. +
    rx_rate + Number of bytes received over the network per second. +
    tx + Cumulative number of bytes sent over the network +
    tx_errors + Cumulative number of errors while sending over the network +
    tx_errors_rate + Number of errors while sending over the network +
    tx_rate + Number of bytes sent over the network per second. +
    uptime + - + Number of milliseconds since the container was started. +
    + +### Satellite + +[Satellite](https://github.com/gravitational/satellite) is an open-source tool prepared by Gravitational that collects health information related to the Kubernetes cluster. Satellite runs on each Gravity Cluster node and has various checks assessing the health of a Cluster. + +Satellite collects several metrics related to cluster health and exposes them over the Prometheus endpoint. Among the metrics collected by Satellite are: + +* Etcd related metrics: + * Current leader address + * Etcd cluster health +* Docker related metrics: + * Overall health of the Docker daemon +* Sysctl related metrics: + * Status of IPv4 forwarding + * Status of netfilter +* Systemd related metrics: + * State of various systemd units such as etcd, flannel, kube-*, etc. + + +### Telegraf + +The nodes also run [Telegraf](https://github.com/influxdata/telegraf/tree/master/plugins/inputs/system) - an agent for collecting, processing, aggregating, and writing metrics. Some system input plugins related to cpu and memory are captured as default metrics as well. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Metric Name + Description +
    load1 (float) + Warning threshold for load over 1 min +
    load15 (float) + Warning threshold for load over 15 mins +
    load5 (float) + Warning threshold for load over 5 mins +
    n_users (integer) + Number of users +
    n_cpus (integer) + Number of CPU cores +
    uptime (integer, seconds) + Number of milliseconds since the system was started +
    + +In addition to the default metrics, Telegraf also queries the Satellite Prometheus endpoint described above and ships all metrics to the same “k8s” database in InfluxDB. + +Telegraf configuration can be found [here](https://github.com/gravitational/monitoring-app/tree/version/5.5.x/images/telegraf/rootfs/etc/telegraf). The respective configuration files show which input plugins each Telegraf instance has enabled. + +## More about Kapacitor + +As mentioned Kapacitor is the alerting system that streams data from InfluxDB and handles alerts sent to users. Kapacitor can also be configured to send email alerts, or customized with other alerts. + +The following are alerts that Gravity Monitoring & Alerts system ships with by default: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Component + Alert + Description +
    CPU + High CPU usage + Warning at > 75% used +

    +Critical error at > 90% used +

    Memory + High Memory usage + Warning at > 80% used +

    +Critical error at > 90% used +

    Systemd + Individual + Error when unit not loaded/active +
    Overall systemd health + Error when systemd detects a failed service +
    Filesystem + High disk space usage + Warning at > 80% used +

    +Critical error at > 90% used +

    High inode usage + Warning at > 90% used +

    +Critical error at > 95% used +

    System + Uptime + Warning node uptime < 5 mins +
    Kernel params + Error if param not set +
    Etcd + Etcd instance health + Error when etcd master down > 5 mins +
    Etcd latency check + Warning when follower <-> leader latency > 500 ms +

    +Error when > 1 sec over period of 1 min +

    Docker + Docker daemon health + Error when docker daemon is down +
    InfluxDB + InfluxDB instance health + Error when InfluxDB is inaccessible +
    Kubernetes + Kubernetes node readiness + Error when the node is not ready +
    + +### Kapacitor Email Configuration + +In order to configure email alerts via Kapacitor you will need to create Gravity resources of type `smtp `and `alerttarget`. + +An example of the configuration is shown below: + +``` +kind: smtp +version: v2 +metadata: + name: smtp +spec: + host: smtp.host + port: # 465 by default + username: + password: +--- +kind: alerttarget +version: v2 +metadata: + name: email-alerts +spec: + email: triage@example.com # Email address of the alert recipient +``` + +Creating these resources will accordingly update and reload Kapacitor configuration: + +``` +$ gravity resource create -f smtp.yaml +``` + +In order to view the current SMTP settings or alert target: + +``` +$ gravity resource get smtp +$ gravity resource get alerttarget +``` + +Only a single alert target can be configured. To remove the current alert target, you can execute the following kapacitor command inside the designated pod: + +``` +$ kapacitor delete alerttarget email-alerts +``` + +### Testing Kapacitor Email Configuration + +To test a Kapacitor SMTP configuration you can execute the following: + +``` +$ kubectl exec -n monitoring $POD_ID -c kapacitor -- /bin/bash -c "kapacitor service-tests smtp" +``` + +If the settings are set up appropriately, the recipient should receive an email with the subject “test subject”. + +### Kapacitor Custom Alerts + +Creating new alerts is as easy as using another Gravity resource of type `alert`. The alerts are written in [TICKscript](https://docs.influxdata.com/kapacitor/v1.2/tick/) and are automatically detected, loaded, and enabled for Gravity Monitoring and Alerts system. + +For demonstration purposes let’s define an alert that always fires: + +``` +kind: alert +version: v2 +metadata: + name: my-formula +spec: + formula: | + var period = 5m + var every = 1m + var warnRate = 2 + var warnReset = 1 + var usage_rate = stream + |from() + .measurement('cpu/usage_rate') + .groupBy('nodename') + .where(lambda: "type" == 'node') + |window() + .period(period) + .every(every) + var cpu_total = stream + |from() + .measurement('cpu/node_capacity') + .groupBy('nodename') + .where(lambda: "type" == 'node') + |window() + .period(period) + .every(every) + var percent_used = usage_rate + |join(cpu_total) + .as('usage_rate', 'total') + .tolerance(30s) + .streamName('percent_used') + |eval(lambda: (float("usage_rate.value") * 100.0) / float("total.value")) + .as('percent_usage') + |mean('percent_usage') + .as('avg_percent_used') + var trigger = percent_used + |alert() + .message('{{ .Level}} / Node {{ index .Tags "nodename" }} has high cpu usage: {{ index .Fields "avg_percent_used" }}%') + .warn(lambda: "avg_percent_used" > warnRate) + .warnReset(lambda: "avg_percent_used" < warnReset) + .stateChangesOnly() + .details(''' + {{ .Message }} +

    Level: {{ .Level }}

    +

    Nodename: {{ index .Tags "nodename" }}

    +

    Usage: {{ index .Fields "avg_percent_used" | printf "%0.2f" }}%

    + ''') + .email() + .log('/var/lib/kapacitor/logs/high_cpu.log') + .mode(0644) +``` + +And create it : + +``` +$ gravity resource create -f formula.yaml +``` + +Custom alerts are being monitored by another “watcher” type of service that runs inside the Kapacitor pod: + +``` +$ kubectl -nmonitoring logs kapacitor-68f6d76878-8m26x watcher +time="2020-01-24T06:18:10Z" level=info msg="Detected event ADDED for configmap \"my-formula\"" label="monitoring in (alert)" watch=configmap +``` + +We can confirm the alert is running checking the logs after a few seconds: + +``` +$ kubectl -nmonitoring exec -ti kapacitor-68f6d76878-8m26x -c kapacitor cat -- /var/lib/kapacitor/logs/high_cpu.log +{"id":"percent_used:nodename=10.0.2.15","message":"WARNING / Node 10.0.2.15 has high cpu usage: 15%","details":"\n\u003cb\u003eWARNING / Node 10.0.2.15 has high cpu usage: 15%\u003c/b\u003e\n\u003cp\u003eLevel: WARNING\u003c/p\u003e\n\u003cp\u003eNodename: 10.0.2.15\u003c/p\u003e\n\u003cp\u003eUsage: 15.00%\u003c/p\u003e\n","time":"2020-01-24T06:30:00Z","duration":0,"level":"WARNING","data":{"series":[{"name":"percent_used","tags":{"nodename":"10.0.2.15"},"columns":["time","avg_percent_used"],"values":[["2020-01-24T06:30:00Z",15]]}]},"previousLevel":"OK","recoverable":true} +``` + +To view all currently configured custom alerts you can run: + +``` +$ gravity resource get alert my-formula +``` + +In order to remove a specific alert you can execute the following kapacitor command inside the designated pod: + +``` +$ kapacitor delete alert my-formula +``` + +This concludes our monitoring training. diff --git a/networking-workshop/monitoring-6.x.md b/networking-workshop/monitoring-6.x.md new file mode 100644 index 00000000..9d11fd82 --- /dev/null +++ b/networking-workshop/monitoring-6.x.md @@ -0,0 +1,815 @@ +# Gravity Monitoring & Alerts (for Gravity 6.0 and later) + +## Prerequisites + +Docker 101, Kubernetes 101, Gravity 101. + +## Introduction + +_Note: This part of the training pertains to Gravity 6.0 and later. In Gravity 6.0 Gravitational replaced InfluxDB/Kapacitor monitoring stack with Prometheus/Alertmanager._ + +Gravity Clusters come with a fully configured and customizable monitoring and alerting systems by default. The system consists of various components, which are automatically included into a Cluster Image that is built with a single command `tele build`. + +## Overview + +Before getting into Gravity’s monitoring and alerts capability in more detail, let’s first discuss the various components that are involved. + +There are 4 main components in the monitoring system: Prometheus, Grafana, Alertmanager and Satellite. + +### Prometheus + +Is an open source Kubernetes native monitoring system and time-series database that collects hardware and OS metrics, as well as metrics about various k8s resources (deployments, nodes, and pods). Prometheus exposes the cluster-internal service `prometheus-k8s.monitoring.svc.cluster.local:9090`. + +### Grafana + +Is an open source metrics suite which provides the dashboard in the Gravity monitoring and alerts system. The dashboard provides a visual to the information stored in Prometheus, which is exposed as the service `grafana.monitoring.svc.cluster.local:3000`. Credentials generated are placed into a secret `grafana` in the monitoring namespace + +Gravity is shipped with 2 pre-configured dashboards providing a visual of machine and pod-level overview of the installed cluster. Within the Gravity control panel, you can access the dashboard by navigating to the Monitoring page. + +By default, Grafana is running in anonymous read-only mode. Anyone who logs into Gravity can view but not modify the dashboards. + +### Alertmanager + +Is a Prometheus component that handles alerts sent by client applications such as a Prometheus server. Alertmanager handles deduplicating, grouping and routing alerts to the correct receiver integration such as an email recipient. Alertmanager exposes the cluster-internal service `alertmanager-main.monitoring.svc.cluster.local:9093`. + +### Satellite + +[Satellite](https://github.com/gravitational/satellite) is an open-source tool prepared by Gravitational that collects health information related to the Kubernetes cluster. Satellite runs on each Gravity Cluster node and has various checks assessing the health of a Cluster. Any issues detected by Satellite are shown in the output of the gravity status command. + +## Metrics Overview + +All monitoring components are running in the “monitoring” namespace in Gravity. Let’s take a look at them: + +``` +$ kubectl -nmonitoring get pods +NAME READY STATUS RESTARTS AGE +alertmanager-main-0 3/3 Running 0 27m +alertmanager-main-1 3/3 Running 0 26m +alertmanager-main-2 3/3 Running 0 26m +grafana-6b645587d-chxxg 2/2 Running 0 27m +kube-state-metrics-69594c468-wcr4g 3/3 Running 0 27m +nethealth-4cjwh 1/1 Running 0 26m +node-exporter-hz972 2/2 Running 0 27m +prometheus-adapter-6586cf7b4f-hmwkf 1/1 Running 0 27m +prometheus-k8s-0 3/3 Running 1 26m +prometheus-k8s-1 0/3 Pending 0 26m +prometheus-operator-7bd7d57788-mf8xn 1/1 Running 0 27m +watcher-7b99cc55c-8qgms 1/1 Running 0 27m +``` + +Most of the cluster metrics are collected by Prometheus which uses the following in-cluster services: + +* [node-exporter](https://github.com/prometheus/node_exporter) (collects hardware and OS metrics) +* [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) (collects Kubernetes resource metrics - deployments, nodes, pods) + +kube-state-metrics collects metrics about various Kubernetes resources such as deployments, nodes and pods. It is a service that listens to the Kubernetes API server and generates metrics about the state of the objects. + +Further, kube-state-metrics exposes raw data that is unmodified from the Kubernetes API, which allows users to have all the data they require and perform heuristics as they see fit. In return, kubectl may not show the same values, as kubectl applies certain heuristics to display cleaner messages. + +Metrics from kube-state-metrics service are exported on the HTTP endpoint `/metrics` on the listening port (default 8080) and are designed to be consumed by Prometheus. + +![diagram](https://miro.medium.com/max/832/1*7thrW4Wa5y6b03PxtPlQzA.jpeg) + +(Source: https://medium.com/faun/production-grade-kubernetes-monitoring-using-prometheus-78144b835b60) + +All metrics collected by node-exporter and kube-state-metrics are stored as time series in Prometheus. See below for a list of metrics collected by Prometheus. Each metric is stored as a separate “series” in Prometheus. + +Prometheus allows users to differentiate on the things that are being measured. Label names should not be used in the metric name as that leads to some redundancy. + +* `api_http_requests_total` - differentiate request types: `operation="create|update|delete"` + +When troubleshooting problems with metrics, it is sometimes useful to look into the specified container logs where it can be seen if it experiences communication issues with Prometheus service or has other issues: + +``` +$ kubectl -nmonitoring logs prometheus-adapter-6586cf7b4f-hmwkf +``` + +``` +$ kubectl -nmonitoring logs kube-state-metrics-69594c468-wcr4g kube-state-metrics +``` + +``` +$ kubectl -nmonitoring logs node-exporter-hz972 node-exporter +``` + +In addition, any other apps that collect metrics should also submit them into the same DB in order for proper retention policies to be enforced. + +## Exploring Prometheus + +Like mentioned above, Prometheus is exposed via a cluster-local Kubernetes service `prometheus-k8s.monitoring.svc.cluster.local:9090` and serves its HTTP API on port `9090` so we can use it to explore the database from the CLI. + +Also, as seen above we have the following Prometheus pods: + +``` +prometheus-adapter-6586cf7b4f-hmwkf +prometheus-k8s-0 +prometheus-operator-7bd7d57788-mf8xn +``` +Prometheus operator for Kubernetes allows easy monitoring definitions for kubernetes services and deployment and management of Prometheus instances. + +Prometheus adapter is an API extension for kubernetes that users prometheus queries to populate kubernetes resources and custom metrics APIs. + +Let's enter the Gravity master container to make sure the services are resolvable and to get access to additional CLI tools: + +```bash +$ sudo gravity shell +``` + +Let's ping the database to make sure it's up and running: + +```bash +$ curl -sl http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/status/config +// Should return "status":"success" within currently loaded configuration file. +``` + +A list of alerting and recording rules that are currently loaded is available by executing: + +```bash +$ curl http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/rules | jq +``` +Also we can see all metric points, by executing the following command: + +```bash +$ curl http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query?query=up | jq +``` + +Finally, we can query Prometheus using it's SQL-like query language (PromQL) to for example evaluate metrics identified under the expression `up` at the specified time: + +```bash +$ curl 'http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query?query=up&time=2020-03-13T20:10:51.781Z' | jq +``` + +Refer to the Prometheus [API documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/) if you want to learn more about querying the database. + +## Metric Retention Policy + +### Time based retention + +By default Gravitational configures Prometheus with a time based retention policy of 30 days. + +## Custom Dashboards + +Along with the dashboards mentioned above, your applications can use their own Grafana dashboards by using ConfigMaps. + +In order to create a custom dashboard, the ConfigMap should be created in the `monitoring` namespace, assigned a `monitoring` label with a value `dashboard`. + +Under the specified namespace, the ConfigMap will be recognized and loaded when installing the application. It is possible to add new ConfigMaps at a later time as the watcher will then pick it up and create it in Grafana. Similarly, if you delete the ConfigMap, the watcher will delete it from Grafana. + +Dashboard ConfigMaps may contain multiple keys with dashboards as key names are not relevant. + +An example ConfigMap is shown below: + +``` +apiVersion: v1 +kind: ConfigMap +metadata: + name: mydashboard + namespace: monitoring + labels: + monitoring: dashboard +data: + mydashboard: | + { ... dashboard JSON ... } +``` + +_Note: by default Grafana is run in read-only mode, a separate Grafana instance is required to create custom dashboards._ + +## Default Metrics + +The following are the default metrics captured by the Gravity Monitoring & Alerts system: + +### node-exporter Metrics + +Below are a list of metrics captured by node-exporter which are exported to the backend by based on OS: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Name + Description + OS +
    arp + Exposes ARP statistics from /proc/net/arp. + Linux +
    bcache + Exposes bcache statistics from /sys/fs/bcache/. + Linux +
    bonding + Exposes the number of configured and active slaves of Linux bonding interfaces. + Linux +
    boottime + Exposes system boot time derived from the kern.boottime sysctl. + Darwin, Dragonfly, FreeBSD, NetBSD, OpenBSD, Solaris +
    conntrack + Shows conntrack statistics (does nothing if no /proc/sys/net/netfilter/ present). + Linux +
    cpu + Exposes CPU statistics + Darwin, Dragonfly, FreeBSD, Linux, Solaris +
    cpufreq + Exposes CPU frequency statistics + Linux, Solaris +
    diskstats + Exposes disk I/O statistics. + Darwin, Linux, OpenBSD +
    edac + Exposes error detection and correction statistics. + Linux +
    entropy + Exposes available entropy. + Linux +
    exec + Exposes execution statistics. + Dragonfly, FreeBSD +
    filefd + Exposes file descriptor statistics from /proc/sys/fs/file-nr. + Linux +
    filesystem + Exposes filesystem statistics, such as disk space used. + Darwin, Dragonfly, FreeBSD, Linux, OpenBSD +
    hwmon + Expose hardware monitoring and sensor data from /sys/class/hwmon/. + Linux +
    infiniband + Exposes network statistics specific to InfiniBand and Intel OmniPath configurations. + Linux +
    ipvs + Exposes IPVS status from /proc/net/ip_vs and stats from /proc/net/ip_vs_stats. + Linux +
    loadavg + Exposes load average. + Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris +
    mdadm + Exposes statistics about devices in /proc/mdstat (does nothing if no /proc/mdstat present). + Linux +
    meminfo + Exposes memory statistics. + Darwin, Dragonfly, FreeBSD, Linux, OpenBSD +
    netclass + Exposes network interface info from /sys/class/net/ + Linux +
    netdev + Exposes network interface statistics such as bytes transferred. + Darwin, Dragonfly, FreeBSD, Linux, OpenBSD +
    netstat + Exposes network statistics from /proc/net/netstat. This is the same information as netstat -s. + Linux +
    nfs + Exposes NFS client statistics from /proc/net/rpc/nfs. This is the same information as nfsstat -c. + Linux +
    nfsd + Exposes NFS kernel server statistics from /proc/net/rpc/nfsd. This is the same information as nfsstat -s. + Linux +
    pressure + Exposes pressure stall statistics from /proc/pressure/. + Linux (kernel 4.20+ and/or CONFIG_PSI) +
    rapl + Exposes various statistics from /sys/class/powercap. + Linux +
    schedstat + Exposes task scheduler statistics from /proc/schedstat. + Linux +
    sockstat + Exposes various statistics from /proc/net/sockstat. + Linux +
    softnet + Exposes statistics from /proc/net/softnet_stat. + Linux +
    stat + Exposes various statistics from /proc/stat. This includes boot time, forks and interrupts. + Linux +
    textfile + Exposes statistics read from local disk. The --collector.textfile.directory flag must be set. + any +
    thermal_zone + Exposes thermal zone & cooling device statistics from /sys/class/thermal. + Linux +
    time + Exposes the current system time. + any +
    timex + Exposes selected adjtimex(2) system call stats. + Linux +
    uname + Exposes system information as provided by the uname system call. + Darwin, FreeBSD, Linux, OpenBSD +
    vmstat + Exposes statistics from /proc/vmstat. + Linux +
    xfs + Exposes XFS runtime statistics. + Linux (kernel 4.4+) +
    zfs + Exposes ZFS performance statistics. + Linux, Solaris +
    + +### kube-state-metrics + +A list of metrics captured by kube-state metrics can be found [here](https://github.com/kubernetes/kube-state-metrics/tree/master/docs). + +There are various groups of metrics for each set, some of these include: + +* ConfigMap Metrics +* Pod Metrics +* ReplicaSet Metrics +* Service Metrics + +Example list of [ConfigMap Metrics](https://github.com/kubernetes/kube-state-metrics/blob/master/docs/configmap-metrics.md) + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Metric name + Metric type + Labels/tags + Status +
    kube_configmap_info + Gauge + configmap=<configmap-name> +

    +namespace=<configmap-namespace> +

    STABLE +
    kube_configmap_created + Gauge + configmap=<configmap-name> +

    +namespace=<configmap-namespace> +

    STABLE +
    kube_configmap_metadata_resource_version + Gauge + configmap=<configmap-name> +

    +namespace=<configmap-namespace> +

    EXPERIMENTAL +
    + +### Satellite + +[Satellite](https://github.com/gravitational/satellite) is an open-source tool prepared by Gravitational that collects health information related to the Kubernetes cluster. Satellite runs on each Gravity Cluster node and has various checks assessing the health of a Cluster. + +Satellite collects several metrics related to cluster health and exposes them over the Prometheus endpoint. Among the metrics collected by Satellite are: + +* Etcd related metrics: + * Current leader address + * Etcd cluster health +* Docker related metrics: + * Overall health of the Docker daemon +* Sysctl related metrics: + * Status of IPv4 forwarding + * Status of netfilter +* Systemd related metrics: + * State of various systemd units such as etcd, flannel, kube-*, etc. + +## More about Alertmanager + +As mentioned Alertmanager is a Prometheus component that handles alerts sent by client applications such as the Prometheus server. Alertmanager handles deduplicating, grouping and routing alerts to the correct receiver integration such as an email recipient. + +The following are alerts that Gravity Monitoring & Alerts system ships with by default: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Component + Alert + Description +
    CPU + High CPU usage + Warning at > 75% used +

    +Critical error at > 90% used +

    Memory + High Memory usage + Warning at > 80% used +

    +Critical error at > 90% used +

    Systemd + Individual + Error when unit not loaded/active +
    Overall systemd health + Error when systemd detects a failed service +
    Filesystem + High disk space usage + Warning at > 80% used +

    +Critical error at > 90% used +

    High inode usage + Warning at > 90% used +

    +Critical error at > 95% used +

    System + Uptime + Warning node uptime < 5 mins +
    Kernel params + Error if param not set +
    Etcd + Etcd instance health + Error when etcd master down > 5 mins +
    Etcd latency check + Warning when follower <-> leader latency > 500 ms +

    +Error when > 1 sec over period of 1 min +

    Docker + Docker daemon health + Error when docker daemon is down +
    Kubernetes + Kubernetes node readiness + Error when the node is not ready +
    + +### Alertmanager Email Configuration + +In order to configure email alerts via Alertmanager you will need to create Gravity resources of type `smtp `and `alerttarget`. + +An example of the configuration is shown below: + +``` +kind: smtp +version: v2 +metadata: + name: smtp +spec: + host: smtp.host + port: # 465 by default + username: + password: +--- +kind: alerttarget +version: v2 +metadata: + name: email-alerts +spec: + email: triage@example.com # Email address of the alert recipient +``` + +Creating these resources will accordingly update and reload Alertmanager configuration: + +``` +$ gravity resource create -f smtp.yaml +``` + +In order to view the current SMTP settings or alert target: + +``` +$ gravity resource get smtp +$ gravity resource get alerttarget +``` + +Only a single alert target can be configured. To remove the current alert target, you can execute the following command: + +``` +$ gravity resource rm alerttarget email-alerts +``` + +### Alertmanager Custom Alerts + +Creating new alerts is as easy as using another Gravity resource of type `alert`. Alerting rules are configured in Prometheus in the same way as recording rules and are automatically detected, loaded, and enabled for Gravity Monitoring and Alerts system. + +For demonstration purposes let’s define an alert that always fires: + +``` +kind: alert +version: v2 +metadata: + name: cpu1 +spec: + alert_name: CPU1 + group_name: test-group + formula: | + node:cluster_cpu_utilization:ratio * 100 > 1 + labels: + severity: info + annotations: + description: | + This is a test alert +``` + +And create it : + +``` +$ gravity resource create -f alert.yaml +``` + +Custom alerts are being monitored by another “watcher” type of service that runs in its own pod: + +``` +$ kubectl -nmonitoring logs watcher-7b99cc55c-8qgms +time="2020-03-14T01:12:02Z" level=info msg="Detected event ADDED for configmap cpu1." label="monitoring in (alert)" watch=configmap +``` + +We can confirm the alert is running by checking active alerts to see if the cluster has overcommitted CPU resource requests, as we set the cpu usage threshold to 1%. + +```bash +$ sudo gravity shell +``` + +```bash +$ curl http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/alerts | jq +``` + +We see the following output: + +```bash + { + "labels": { + "alertname": "CPU1", + "node": "abdu-dev-test0", + "severity": "info" + }, + "annotations": { + "description": "This is a test alert\n" + }, + "state": "firing", + "activeAt": "2020-03-14T01:12:20.102178408Z", + "value": 43.51506264996971 + } +``` + +To view all currently configured custom alerts you can run: + +``` +$ gravity resource get alert cpu1 +``` + +In order to remove a specific alert you can execute the following altermanager command inside the designated pod: + +``` +$ gravity resource rm alert cpu1 +``` + +This concludes our monitoring training. diff --git a/networking-workshop/my-nginx-configmap.yaml b/networking-workshop/my-nginx-configmap.yaml new file mode 100644 index 00000000..c6d25b5f --- /dev/null +++ b/networking-workshop/my-nginx-configmap.yaml @@ -0,0 +1,30 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + run: my-nginx + name: my-nginx + namespace: default +spec: + replicas: 2 + selector: + matchLabels: + run: my-nginx + template: + metadata: + labels: + run: my-nginx + spec: + containers: + - image: nginx:1.11.5 + name: my-nginx + ports: + - containerPort: 80 + protocol: TCP + volumeMounts: + - name: config-volume + mountPath: /etc/nginx/conf.d + volumes: + - name: config-volume + configMap: + name: my-nginx-v1 diff --git a/networking-workshop/my-nginx-new.yaml b/networking-workshop/my-nginx-new.yaml new file mode 100644 index 00000000..030ca163 --- /dev/null +++ b/networking-workshop/my-nginx-new.yaml @@ -0,0 +1,23 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + run: my-nginx + name: my-nginx + namespace: default +spec: + replicas: 2 + selector: + matchLabels: + run: my-nginx + template: + metadata: + labels: + run: my-nginx + spec: + containers: + - image: nginx:1.17.5 + name: my-nginx + ports: + - containerPort: 80 + protocol: TCP diff --git a/networking-workshop/my-nginx-typo.yaml b/networking-workshop/my-nginx-typo.yaml new file mode 100644 index 00000000..e0ef6448 --- /dev/null +++ b/networking-workshop/my-nginx-typo.yaml @@ -0,0 +1,23 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + run: my-nginx + name: my-nginx + namespace: default +spec: + replicas: 2 + selector: + matchLabels: + run: my-nginx + template: + metadata: + labels: + run: my-nginx + spec: + containers: + - image: nginx:999 # <-- TYPO: non-existent version + name: my-nginx + ports: + - containerPort: 80 + protocol: TCP diff --git a/networking-workshop/prod/background/Dockerfile b/networking-workshop/prod/background/Dockerfile new file mode 100644 index 00000000..7d98a238 --- /dev/null +++ b/networking-workshop/prod/background/Dockerfile @@ -0,0 +1,4 @@ +FROM library/python:3.3 + +ADD start.sh /start.sh +RUN chmod +x /start.sh diff --git a/networking-workshop/prod/background/crash.yaml b/networking-workshop/prod/background/crash.yaml new file mode 100644 index 00000000..068566e1 --- /dev/null +++ b/networking-workshop/prod/background/crash.yaml @@ -0,0 +1,11 @@ +apiVersion: v1 +kind: Pod +metadata: + name: crash + namespace: default +spec: + containers: + - command: ['/start.sh'] + image: localhost:5000/background:0.0.1 + name: pod + imagePullPolicy: Always diff --git a/networking-workshop/prod/background/fix.yaml b/networking-workshop/prod/background/fix.yaml new file mode 100644 index 00000000..243a3016 --- /dev/null +++ b/networking-workshop/prod/background/fix.yaml @@ -0,0 +1,16 @@ +apiVersion: v1 +kind: Pod +metadata: + name: fix + namespace: default +spec: + containers: + - command: ['/start.sh'] + image: localhost:5000/background:0.0.1 + name: server + imagePullPolicy: Always + livenessProbe: + httpGet: + path: / + port: 5000 + timeoutSeconds: 1 \ No newline at end of file diff --git a/networking-workshop/prod/background/start.sh b/networking-workshop/prod/background/start.sh new file mode 100644 index 00000000..2b617f70 --- /dev/null +++ b/networking-workshop/prod/background/start.sh @@ -0,0 +1,6 @@ +#!/bin/bash + +python -m http.serve 5000& + +echo "Everything is great!" +sleep 100000 diff --git a/networking-workshop/prod/build-fix/Dockerfile b/networking-workshop/prod/build-fix/Dockerfile new file mode 100644 index 00000000..2b5e6748 --- /dev/null +++ b/networking-workshop/prod/build-fix/Dockerfile @@ -0,0 +1,7 @@ +FROM ubuntu:18.04 + +RUN apt-get update +RUN apt-get install -y gcc +ADD hello.c /build/hello.c +RUN gcc /build/hello.c -o /build/hello +ENTRYPOINT ["/build/hello"] diff --git a/networking-workshop/prod/build-fix/build.dockerfile b/networking-workshop/prod/build-fix/build.dockerfile new file mode 100644 index 00000000..3eb2c24a --- /dev/null +++ b/networking-workshop/prod/build-fix/build.dockerfile @@ -0,0 +1,4 @@ +FROM ubuntu:18.04 + +RUN apt-get update +RUN apt-get install -y gcc diff --git a/networking-workshop/prod/build-fix/hello b/networking-workshop/prod/build-fix/hello new file mode 100755 index 00000000..d34db825 Binary files /dev/null and b/networking-workshop/prod/build-fix/hello differ diff --git a/networking-workshop/prod/build-fix/hello.c b/networking-workshop/prod/build-fix/hello.c new file mode 100644 index 00000000..de65ff7e --- /dev/null +++ b/networking-workshop/prod/build-fix/hello.c @@ -0,0 +1,7 @@ +#include + +int main() +{ + printf("Hello World\n"); + return 0; +} diff --git a/networking-workshop/prod/build-fix/multi.dockerfile b/networking-workshop/prod/build-fix/multi.dockerfile new file mode 100644 index 00000000..5eba72b2 --- /dev/null +++ b/networking-workshop/prod/build-fix/multi.dockerfile @@ -0,0 +1,17 @@ +# +# Build stage. +# +FROM ubuntu:18.04 + +RUN apt-get update +RUN apt-get install -y gcc +ADD hello.c /build/hello.c +RUN gcc /build/hello.c -o /build/hello + +# +# Run stage. +# +FROM quay.io/gravitational/debian-tall:0.0.1 + +COPY --from=0 /build/hello /hello +ENTRYPOINT ["/hello"] diff --git a/networking-workshop/prod/build-fix/run.dockerfile b/networking-workshop/prod/build-fix/run.dockerfile new file mode 100644 index 00000000..66e053db --- /dev/null +++ b/networking-workshop/prod/build-fix/run.dockerfile @@ -0,0 +1,4 @@ +FROM quay.io/gravitational/debian-tall:0.0.1 + +ADD hello /hello +ENTRYPOINT ["/hello"] \ No newline at end of file diff --git a/networking-workshop/prod/build/Dockerfile b/networking-workshop/prod/build/Dockerfile new file mode 100644 index 00000000..2b5e6748 --- /dev/null +++ b/networking-workshop/prod/build/Dockerfile @@ -0,0 +1,7 @@ +FROM ubuntu:18.04 + +RUN apt-get update +RUN apt-get install -y gcc +ADD hello.c /build/hello.c +RUN gcc /build/hello.c -o /build/hello +ENTRYPOINT ["/build/hello"] diff --git a/networking-workshop/prod/build/hello.c b/networking-workshop/prod/build/hello.c new file mode 100644 index 00000000..de65ff7e --- /dev/null +++ b/networking-workshop/prod/build/hello.c @@ -0,0 +1,7 @@ +#include + +int main() +{ + printf("Hello World\n"); + return 0; +} diff --git a/networking-workshop/prod/cbreaker/Dockerfile b/networking-workshop/prod/cbreaker/Dockerfile new file mode 100644 index 00000000..93d10ce0 --- /dev/null +++ b/networking-workshop/prod/cbreaker/Dockerfile @@ -0,0 +1,6 @@ +FROM library/python:2.7 + +RUN pip install flask requests +ADD weather.py /weather.py +ADD mail.py /mail.py +ADD frontend.py /frontend.py diff --git a/networking-workshop/prod/cbreaker/cbreaker.dockerfile b/networking-workshop/prod/cbreaker/cbreaker.dockerfile new file mode 100644 index 00000000..deeea6a7 --- /dev/null +++ b/networking-workshop/prod/cbreaker/cbreaker.dockerfile @@ -0,0 +1,4 @@ +FROM library/python:2.7 + +RUN pip install flask requests +ADD cbreaker.py /cbreaker.py diff --git a/networking-workshop/prod/cbreaker/cbreaker.py b/networking-workshop/prod/cbreaker/cbreaker.py new file mode 100644 index 00000000..adc4e17b --- /dev/null +++ b/networking-workshop/prod/cbreaker/cbreaker.py @@ -0,0 +1,55 @@ +from flask import Flask +import requests +from datetime import datetime, timedelta +from threading import Lock +import logging, sys + + +app = Flask(__name__) + +circuit_tripped_until = datetime.now() +mutex = Lock() + +def trip(): + global circuit_tripped_until + mutex.acquire() + try: + circuit_tripped_until = datetime.now() + timedelta(0,30) + app.logger.info("circuit tripped until %s" %(circuit_tripped_until)) + finally: + mutex.release() + +def is_tripped(): + global circuit_tripped_until + mutex.acquire() + try: + return datetime.now() < circuit_tripped_until + finally: + mutex.release() + + +@app.route("/") +def hello(): + weather = "weather unavailable" + try: + if is_tripped(): + return "circuit breaker: service unavailable (tripped)" + + r = requests.get('http://localhost:5000', timeout=1) + app.logger.info("requesting weather...") + start = datetime.now() + app.logger.info("got weather in %s ..." % (datetime.now() - start)) + if r.status_code == requests.codes.ok: + return r.text + else: + trip() + return "circuit brekear: service unavailable (tripping 1)" + except: + app.logger.info("exception: %s", sys.exc_info()[0]) + trip() + return "circuit brekear: service unavailable (tripping 2)" + +if __name__ == "__main__": + app.logger.addHandler(logging.StreamHandler(sys.stdout)) + app.logger.setLevel(logging.DEBUG) + app.run(host='0.0.0.0', port=6000) diff --git a/networking-workshop/prod/cbreaker/frontend.py b/networking-workshop/prod/cbreaker/frontend.py new file mode 100644 index 00000000..0adbf1e6 --- /dev/null +++ b/networking-workshop/prod/cbreaker/frontend.py @@ -0,0 +1,43 @@ +from __future__ import print_function +from flask import Flask +import requests +from datetime import datetime +app = Flask(__name__) + +@app.route("/") +def hello(): + weather = "weather unavailable" + try: + print("requesting weather...") + start = datetime.now() + r = requests.get('http://weather') + print("got weather in %s ..." % (datetime.now() - start)) + if r.status_code == requests.codes.ok: + weather = r.text + except: + print("weather unavailable") + + print("requesting mail...") + r = requests.get('http://mail') + mail = r.json() + print("got mail in %s ..." % (datetime.now() - start)) + + out = [] + for letter in mail: + out.append("
  • From: %s Subject: %s
  • " % (letter['from'], letter['subject'])) + + return ''' + +

    Weather

    +

    %s

    +

    Email

    +

    +

      + %s +
    +

    + +''' % (weather, '
    '.join(out)) + +if __name__ == "__main__": + app.run(host='0.0.0.0') diff --git a/networking-workshop/prod/cbreaker/mail.py b/networking-workshop/prod/cbreaker/mail.py new file mode 100644 index 00000000..ec8aa14a --- /dev/null +++ b/networking-workshop/prod/cbreaker/mail.py @@ -0,0 +1,11 @@ +from flask import Flask,jsonify +app = Flask(__name__) + +@app.route("/") +def hello(): + return jsonify([ + {"from": "", "subject": "lunch at noon tomorrow"}, + {"from": "", "subject": "compiler docs"}]) + +if __name__ == "__main__": + app.run(host='0.0.0.0') diff --git a/networking-workshop/prod/cbreaker/service.yaml b/networking-workshop/prod/cbreaker/service.yaml new file mode 100644 index 00000000..3d7eb7a9 --- /dev/null +++ b/networking-workshop/prod/cbreaker/service.yaml @@ -0,0 +1,111 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: frontend + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: frontend + template: + metadata: + labels: + app: frontend + spec: + containers: + - command: ['python', '/frontend.py'] + image: localhost:5000/mail:0.0.1 + name: frontend + imagePullPolicy: Always + ports: + - containerPort: 5000 + protocol: TCP +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: weather + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: weather + template: + metadata: + labels: + app: weather + spec: + containers: + - command: ['python', '/weather.py'] + image: localhost:5000/mail:0.0.1 + name: frontend + imagePullPolicy: Always + ports: + - containerPort: 5000 + protocol: TCP +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: mail + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: mail + template: + metadata: + labels: + app: mail + spec: + containers: + - command: ['python', '/mail.py'] + image: localhost:5000/mail:0.0.1 + name: mail + imagePullPolicy: Always + ports: + - containerPort: 5000 + protocol: TCP +--- +--- +apiVersion: v1 +kind: Service +metadata: + name: frontend + labels: + app: frontend +spec: + ports: + - port: 80 + targetPort: 5000 + selector: + app: frontend +--- +apiVersion: v1 +kind: Service +metadata: + name: mail + labels: + app: mail +spec: + ports: + - port: 80 + targetPort: 5000 + selector: + app: mail +--- +apiVersion: v1 +kind: Service +metadata: + name: weather + labels: + app: weather +spec: + ports: + - port: 80 + targetPort: 5000 + selector: + app: weather diff --git a/networking-workshop/prod/cbreaker/weather-cbreaker.yaml b/networking-workshop/prod/cbreaker/weather-cbreaker.yaml new file mode 100644 index 00000000..029f7c42 --- /dev/null +++ b/networking-workshop/prod/cbreaker/weather-cbreaker.yaml @@ -0,0 +1,30 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: weather + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: weather + template: + metadata: + labels: + app: weather + spec: + containers: + - command: ['python', '/weather.py'] + image: localhost:5000/weather-crash-slow:0.0.1 + name: weather + imagePullPolicy: Always + ports: + - containerPort: 5000 + protocol: TCP + - command: ['python', '/cbreaker.py'] + image: localhost:5000/cbreaker:0.0.1 + name: cbreaker + imagePullPolicy: Always + ports: + - containerPort: 6000 + protocol: TCP diff --git a/networking-workshop/prod/cbreaker/weather-crash-slow.dockerfile b/networking-workshop/prod/cbreaker/weather-crash-slow.dockerfile new file mode 100644 index 00000000..47130960 --- /dev/null +++ b/networking-workshop/prod/cbreaker/weather-crash-slow.dockerfile @@ -0,0 +1,4 @@ +FROM library/python:2.7 + +RUN pip install flask requests +ADD weather-crash-slow.py /weather.py diff --git a/networking-workshop/prod/cbreaker/weather-crash-slow.py b/networking-workshop/prod/cbreaker/weather-crash-slow.py new file mode 100644 index 00000000..efbb929d --- /dev/null +++ b/networking-workshop/prod/cbreaker/weather-crash-slow.py @@ -0,0 +1,12 @@ +from flask import Flask +import time + +app = Flask(__name__) + +@app.route("/") +def hello(): + time.sleep(30) + raise Exception("System overloaded") + +if __name__ == "__main__": + app.run(host='0.0.0.0') diff --git a/networking-workshop/prod/cbreaker/weather-crash-slow.yaml b/networking-workshop/prod/cbreaker/weather-crash-slow.yaml new file mode 100644 index 00000000..5d7e0efa --- /dev/null +++ b/networking-workshop/prod/cbreaker/weather-crash-slow.yaml @@ -0,0 +1,23 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: weather + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: weather + template: + metadata: + labels: + app: weather + spec: + containers: + - command: ['python', '/weather.py'] + image: localhost:5000/weather-crash-slow:0.0.1 + name: frontend + imagePullPolicy: Always + ports: + - containerPort: 5000 + protocol: TCP diff --git a/networking-workshop/prod/cbreaker/weather-crash.dockerfile b/networking-workshop/prod/cbreaker/weather-crash.dockerfile new file mode 100644 index 00000000..f90e761a --- /dev/null +++ b/networking-workshop/prod/cbreaker/weather-crash.dockerfile @@ -0,0 +1,4 @@ +FROM library/python:2.7 + +RUN pip install flask requests +ADD weather-crash.py /weather.py diff --git a/networking-workshop/prod/cbreaker/weather-crash.py b/networking-workshop/prod/cbreaker/weather-crash.py new file mode 100644 index 00000000..c79ee702 --- /dev/null +++ b/networking-workshop/prod/cbreaker/weather-crash.py @@ -0,0 +1,9 @@ +from flask import Flask +app = Flask(__name__) + +@app.route("/") +def hello(): + raise Exception("I am out of service") + +if __name__ == "__main__": + app.run(host='0.0.0.0') diff --git a/networking-workshop/prod/cbreaker/weather-crash.yaml b/networking-workshop/prod/cbreaker/weather-crash.yaml new file mode 100644 index 00000000..6b5ac0be --- /dev/null +++ b/networking-workshop/prod/cbreaker/weather-crash.yaml @@ -0,0 +1,23 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: weather + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: weather + template: + metadata: + labels: + app: weather + spec: + containers: + - command: ['python', '/weather.py'] + image: localhost:5000/weather-crash:0.0.1 + name: frontend + imagePullPolicy: Always + ports: + - containerPort: 5000 + protocol: TCP diff --git a/networking-workshop/prod/cbreaker/weather-service.yaml b/networking-workshop/prod/cbreaker/weather-service.yaml new file mode 100644 index 00000000..40d45744 --- /dev/null +++ b/networking-workshop/prod/cbreaker/weather-service.yaml @@ -0,0 +1,12 @@ +apiVersion: v1 +kind: Service +metadata: + name: weather + labels: + app: weather +spec: + ports: + - port: 80 + targetPort: 6000 + selector: + app: weather \ No newline at end of file diff --git a/networking-workshop/prod/cbreaker/weather.py b/networking-workshop/prod/cbreaker/weather.py new file mode 100644 index 00000000..9ed606d6 --- /dev/null +++ b/networking-workshop/prod/cbreaker/weather.py @@ -0,0 +1,16 @@ +from flask import Flask +app = Flask(__name__) + +@app.route("/") +def hello(): + return '''Pleasanton, CA +Saturday 8:00 PM +Partly Cloudy +12 C +Precipitation: 9% +Humidity: 74% +Wind: 14 km/h +''' + +if __name__ == "__main__": + app.run(host='0.0.0.0') diff --git a/networking-workshop/prod/config/config-fix.dockerfile b/networking-workshop/prod/config/config-fix.dockerfile new file mode 100644 index 00000000..4f38a86e --- /dev/null +++ b/networking-workshop/prod/config/config-fix.dockerfile @@ -0,0 +1,7 @@ +FROM golang:1.12-stretch +ADD config.go /build/config.go +RUN go build -o /build/config /build/config.go + +FROM quay.io/gravitational/debian-tall:0.0.1 +COPY --from=0 /build/config /config +ENTRYPOINT ["/usr/bin/dumb-init", "/config", "/opt/config/config.yaml"] diff --git a/networking-workshop/prod/config/config.dockerfile b/networking-workshop/prod/config/config.dockerfile new file mode 100644 index 00000000..fe1dd0d1 --- /dev/null +++ b/networking-workshop/prod/config/config.dockerfile @@ -0,0 +1,9 @@ +FROM golang:1.12-stretch +ADD config.go /build/config.go +RUN go build -o /build/config /build/config.go + +FROM quay.io/gravitational/debian-tall:0.0.1 +COPY --from=0 /build/config /config +RUN mkdir -p /opt/config +ADD config.yaml /opt/config/config.yaml +ENTRYPOINT ["/usr/bin/dumb-init", "/config", "/opt/config/config.yaml"] diff --git a/networking-workshop/prod/config/config.go b/networking-workshop/prod/config/config.go new file mode 100644 index 00000000..cb8f2fe5 --- /dev/null +++ b/networking-workshop/prod/config/config.go @@ -0,0 +1,20 @@ +package main + +import ( + "fmt" + "io/ioutil" + "os" + "time" +) + +func main() { + if len(os.Args) < 1 { + panic("Usage: ./config ") + } + bytes, err := ioutil.ReadFile(os.Args[1]) + if err != nil { + panic(fmt.Sprintf("Failed to read config file: %v", err)) + } + fmt.Printf("Started with config: %v\n", string(bytes)) + time.Sleep(time.Hour) +} diff --git a/networking-workshop/prod/config/config.yaml b/networking-workshop/prod/config/config.yaml new file mode 100644 index 00000000..b4f46b72 --- /dev/null +++ b/networking-workshop/prod/config/config.yaml @@ -0,0 +1 @@ +key: value diff --git a/networking-workshop/prod/config/pod-fix.yaml b/networking-workshop/prod/config/pod-fix.yaml new file mode 100644 index 00000000..28376e1a --- /dev/null +++ b/networking-workshop/prod/config/pod-fix.yaml @@ -0,0 +1,17 @@ +apiVersion: v1 +kind: Pod +metadata: + name: config + namespace: default +spec: + containers: + - name: config + image: localhost:5000/config:0.0.1 + imagePullPolicy: Always + volumeMounts: + - name: config + mountPath: /opt/config + volumes: + - name: config + configMap: + name: config diff --git a/networking-workshop/prod/config/pod.yaml b/networking-workshop/prod/config/pod.yaml new file mode 100644 index 00000000..fc4699e7 --- /dev/null +++ b/networking-workshop/prod/config/pod.yaml @@ -0,0 +1,10 @@ +apiVersion: v1 +kind: Pod +metadata: + name: config + namespace: default +spec: + containers: + - name: config + image: localhost:5000/config:0.0.1 + imagePullPolicy: Always diff --git a/networking-workshop/prod/delay/Dockerfile b/networking-workshop/prod/delay/Dockerfile new file mode 100644 index 00000000..7d98a238 --- /dev/null +++ b/networking-workshop/prod/delay/Dockerfile @@ -0,0 +1,4 @@ +FROM library/python:3.3 + +ADD start.sh /start.sh +RUN chmod +x /start.sh diff --git a/networking-workshop/prod/delay/deployment-fix.yaml b/networking-workshop/prod/delay/deployment-fix.yaml new file mode 100644 index 00000000..82bfe959 --- /dev/null +++ b/networking-workshop/prod/delay/deployment-fix.yaml @@ -0,0 +1,42 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + app: delay + name: delay + namespace: default +spec: + replicas: 1 + strategy: + type: RollingUpdate + rollingUpdate: + maxUnavailable: 0 + selector: + matchLabels: + app: delay + template: + metadata: + labels: + app: delay + spec: + containers: + - command: ['/start.sh'] + image: localhost:5000/delay:0.0.2 + name: server + imagePullPolicy: Always + ports: + - containerPort: 5000 + protocol: TCP + livenessProbe: + httpGet: + path: / + port: 5000 + timeoutSeconds: 1 + initialDelaySeconds: 31 + periodSeconds: 5 + readinessProbe: + httpGet: + path: / + port: 5000 + timeoutSeconds: 1 + periodSeconds: 5 diff --git a/networking-workshop/prod/delay/deployment-update.yaml b/networking-workshop/prod/delay/deployment-update.yaml new file mode 100644 index 00000000..43ad1c0b --- /dev/null +++ b/networking-workshop/prod/delay/deployment-update.yaml @@ -0,0 +1,36 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + app: delay + name: delay + namespace: default +spec: + replicas: 1 + strategy: + type: RollingUpdate + rollingUpdate: + maxUnavailable: 0 + selector: + matchLabels: + app: delay + template: + metadata: + labels: + app: delay + spec: + containers: + - command: ['/start.sh'] + image: localhost:5000/delay:0.0.2 + name: server + imagePullPolicy: Always + ports: + - containerPort: 5000 + protocol: TCP + livenessProbe: + httpGet: + path: / + port: 5000 + timeoutSeconds: 1 + initialDelaySeconds: 31 + periodSeconds: 5 diff --git a/networking-workshop/prod/delay/deployment.yaml b/networking-workshop/prod/delay/deployment.yaml new file mode 100644 index 00000000..e3a478e1 --- /dev/null +++ b/networking-workshop/prod/delay/deployment.yaml @@ -0,0 +1,36 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + app: delay + name: delay + namespace: default +spec: + replicas: 1 + strategy: + type: RollingUpdate + rollingUpdate: + maxUnavailable: 0 + selector: + matchLabels: + app: delay + template: + metadata: + labels: + app: delay + spec: + containers: + - command: ['/start.sh'] + image: localhost:5000/delay:0.0.1 + name: server + imagePullPolicy: Always + ports: + - containerPort: 5000 + protocol: TCP + livenessProbe: + httpGet: + path: / + port: 5000 + timeoutSeconds: 1 + initialDelaySeconds: 31 + periodSeconds: 5 diff --git a/networking-workshop/prod/delay/service.yaml b/networking-workshop/prod/delay/service.yaml new file mode 100644 index 00000000..cf03da67 --- /dev/null +++ b/networking-workshop/prod/delay/service.yaml @@ -0,0 +1,11 @@ +apiVersion: v1 +kind: Service +metadata: + name: delay + labels: + app: delay +spec: + ports: + - port: 5000 + selector: + app: delay diff --git a/networking-workshop/prod/delay/start.sh b/networking-workshop/prod/delay/start.sh new file mode 100644 index 00000000..178e0f14 --- /dev/null +++ b/networking-workshop/prod/delay/start.sh @@ -0,0 +1,6 @@ +#!/bin/bash + +echo "Starting up" +sleep 30 +echo "Started up successfully" +python -m http.server 5000 diff --git a/networking-workshop/prod/jobs/bad.yaml b/networking-workshop/prod/jobs/bad.yaml new file mode 100644 index 00000000..04197875 --- /dev/null +++ b/networking-workshop/prod/jobs/bad.yaml @@ -0,0 +1,15 @@ +apiVersion: batch/v1 +kind: Job +metadata: + name: bad +spec: + template: + metadata: + name: bad + spec: + restartPolicy: Never + containers: + - name: box + image: busybox + command: ["/bin/sh", "-c", "exit 1"] + \ No newline at end of file diff --git a/networking-workshop/prod/jobs/bound.yaml b/networking-workshop/prod/jobs/bound.yaml new file mode 100644 index 00000000..e7bf62ab --- /dev/null +++ b/networking-workshop/prod/jobs/bound.yaml @@ -0,0 +1,15 @@ +apiVersion: batch/v1 +kind: Job +metadata: + name: bound +spec: + activeDeadlineSeconds: 30 + template: + metadata: + name: bound + spec: + restartPolicy: Never + containers: + - name: box + image: busybox + command: ["/bin/sh", "-c", "exit 1"] diff --git a/networking-workshop/prod/logs/logs.yaml b/networking-workshop/prod/logs/logs.yaml new file mode 100644 index 00000000..46c4b891 --- /dev/null +++ b/networking-workshop/prod/logs/logs.yaml @@ -0,0 +1,10 @@ +apiVersion: v1 +kind: Pod +metadata: + name: logs + namespace: default +spec: + containers: + - command: ['/bin/sh', '-c', "echo hello, world!"] + image: busybox + name: server diff --git a/networking-workshop/prod/pod/deploy.yaml b/networking-workshop/prod/pod/deploy.yaml new file mode 100644 index 00000000..9a2a7029 --- /dev/null +++ b/networking-workshop/prod/pod/deploy.yaml @@ -0,0 +1,17 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: nginx +spec: + selector: + matchLabels: + app: nginx + replicas: 1 + template: + metadata: + labels: + app: nginx + spec: + containers: + - name: nginx + image: nginx diff --git a/networking-workshop/prod/pod/pod.yaml b/networking-workshop/prod/pod/pod.yaml new file mode 100644 index 00000000..dfc68e14 --- /dev/null +++ b/networking-workshop/prod/pod/pod.yaml @@ -0,0 +1,10 @@ +apiVersion: v1 +kind: Pod +metadata: + name: nginx + labels: + app: nginx +spec: + containers: + - name: nginx + image: nginx diff --git a/networking-workshop/prod/quotas/Dockerfile b/networking-workshop/prod/quotas/Dockerfile new file mode 100644 index 00000000..ddd7424b --- /dev/null +++ b/networking-workshop/prod/quotas/Dockerfile @@ -0,0 +1,7 @@ +FROM golang:1.12-stretch +ADD memhog.go /build/memhog.go +RUN go build -o /build/memhog /build/memhog.go + +FROM quay.io/gravitational/debian-tall:0.0.1 +COPY --from=0 /build/memhog /memhog +ENTRYPOINT ["/usr/bin/dumb-init", "/memhog"] diff --git a/networking-workshop/prod/quotas/memhog.go b/networking-workshop/prod/quotas/memhog.go new file mode 100644 index 00000000..868a2ce0 --- /dev/null +++ b/networking-workshop/prod/quotas/memhog.go @@ -0,0 +1,15 @@ +package main + +import "time" + +func main() { + var total [][]int + for i := 0; i < 10000; i++ { + var inner []int + for i := 0; i < 1000; i++ { + inner = append(inner, 0) + } + total = append(total, inner) + } + time.Sleep(time.Hour) +} diff --git a/networking-workshop/prod/quotas/quota.yaml b/networking-workshop/prod/quotas/quota.yaml new file mode 100644 index 00000000..22df0a64 --- /dev/null +++ b/networking-workshop/prod/quotas/quota.yaml @@ -0,0 +1,12 @@ +apiVersion: v1 +kind: Pod +metadata: + name: quota + namespace: default +spec: + containers: + - name: memhog + image: localhost:5000/memhog:0.0.1 + resources: + limits: + memory: "20Mi" diff --git a/networking-workshop/prod/sidecar/conf.d/default.conf b/networking-workshop/prod/sidecar/conf.d/default.conf new file mode 100644 index 00000000..19e21c78 --- /dev/null +++ b/networking-workshop/prod/sidecar/conf.d/default.conf @@ -0,0 +1,15 @@ +upstream backend { + server 127.0.0.1:5000; +} + +limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s; + +server { + listen 80; + server_name localhost; + + location / { + limit_req zone=one burst=5; + proxy_pass http://backend; + } +} diff --git a/networking-workshop/prod/sidecar/service.dockerfile b/networking-workshop/prod/sidecar/service.dockerfile new file mode 100644 index 00000000..986df5cf --- /dev/null +++ b/networking-workshop/prod/sidecar/service.dockerfile @@ -0,0 +1,5 @@ +FROM library/python:2.7 + +RUN pip install flask requests +ADD service.py /service.py +CMD ["python", "/service.py"] diff --git a/networking-workshop/prod/sidecar/service.py b/networking-workshop/prod/sidecar/service.py new file mode 100644 index 00000000..2e3e3b28 --- /dev/null +++ b/networking-workshop/prod/sidecar/service.py @@ -0,0 +1,11 @@ +from flask import Flask +import time + +app = Flask(__name__) + +@app.route("/") +def hello(): + return "hello, sidecar!" + +if __name__ == "__main__": + app.run() diff --git a/networking-workshop/prod/sidecar/sidecar.dockerfile b/networking-workshop/prod/sidecar/sidecar.dockerfile new file mode 100644 index 00000000..5db2eb67 --- /dev/null +++ b/networking-workshop/prod/sidecar/sidecar.dockerfile @@ -0,0 +1,4 @@ +FROM nginx:1.17.5 + +ADD conf.d /etc/nginx/conf.d +CMD ["nginx", "-g", "daemon off;"] diff --git a/networking-workshop/prod/sidecar/sidecar.yaml b/networking-workshop/prod/sidecar/sidecar.yaml new file mode 100644 index 00000000..e8af9eba --- /dev/null +++ b/networking-workshop/prod/sidecar/sidecar.yaml @@ -0,0 +1,38 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: sidecar + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: sidecar + template: + metadata: + labels: + app: sidecar + spec: + containers: + - image: localhost:5000/service:0.0.1 + name: service + imagePullPolicy: Always + - image: localhost:5000/sidecar:0.0.1 + name: sidecar + imagePullPolicy: Always + ports: + - containerPort: 80 + protocol: TCP +--- +apiVersion: v1 +kind: Service +metadata: + name: sidecar + labels: + app: sidecar +spec: + ports: + - port: 80 + targetPort: 80 + selector: + app: sidecar diff --git a/networking-workshop/registry.yaml b/networking-workshop/registry.yaml new file mode 100644 index 00000000..d37ee995 --- /dev/null +++ b/networking-workshop/registry.yaml @@ -0,0 +1,36 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + app: registry + name: registry + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: registry + template: + metadata: + labels: + app: registry + spec: + hostNetwork: true + containers: + - image: registry:2 + name: server +--- +apiVersion: v1 +kind: Service +metadata: + name: registry + namespace: default + labels: + app: registry +spec: + type: ClusterIP + ports: + - name: registry + port: 5000 + selector: + app: registry diff --git a/networking-workshop/upgrade/lab0.sh b/networking-workshop/upgrade/lab0.sh new file mode 100755 index 00000000..89b81efd --- /dev/null +++ b/networking-workshop/upgrade/lab0.sh @@ -0,0 +1,21 @@ +#!/bin/bash +echo " +OBJECTIVE +--------- + +Prepare the lab environment by installing a three-node cluster. + + +INSTRUCTIONS +------------ + +On the first of the three nodes execute the install command from the v1 installer: + + sudo ./gravity install --cluster=test --cloud-provider=generic --token=qwe123 --flavor=three + +On the other two nodes execute the join command: + + sudo ./gravity join --token=qwe123 --role=node + +Verify the cluster is up and running after installation. +" diff --git a/networking-workshop/upgrade/lab1.sh b/networking-workshop/upgrade/lab1.sh new file mode 100755 index 00000000..d4dd680e --- /dev/null +++ b/networking-workshop/upgrade/lab1.sh @@ -0,0 +1,27 @@ +#!/bin/bash +echo " +OBJECTIVE +--------- + +Perform a manual upgrade of the installed cluster. + + +INSTRUCTIONS +------------ + +From the upgrade tarball, upload the new version to the cluster: + + sudo ./upload + +Trigger the operation in manual mode: + + sudo ./gravity upgrade --manual + +Inspect the operation plan and step through the manual upgrade until completion. + +After successful upgrade, uninstall the cluster by running the following command on all three of the nodes: + + sudo gravity system uninstall --confirm + +Then reset the cluster to the original v1 state from lab0. +" diff --git a/networking-workshop/upgrade/lab2-fix.sh b/networking-workshop/upgrade/lab2-fix.sh new file mode 100755 index 00000000..82fc6744 --- /dev/null +++ b/networking-workshop/upgrade/lab2-fix.sh @@ -0,0 +1,4 @@ +#!/bin/bash +echo "ROLLING BACK SCENARIO #2..." +sudo rm /var/lib/gravity/planet/share/dummy +echo "SCENARIO #2 ROLLED BACK" diff --git a/networking-workshop/upgrade/lab2.sh b/networking-workshop/upgrade/lab2.sh new file mode 100755 index 00000000..23caf862 --- /dev/null +++ b/networking-workshop/upgrade/lab2.sh @@ -0,0 +1,32 @@ +#!/bin/bash + +if [ -z "$1" ]; then + echo "Please pass a path to the unpacked upgrade tarball, for example: $0 /home/ubuntu/v2" + exit 1 +fi + +echo "ACTIVATING SCENARIO #2... +" + +path="$1" +sudo $path/upload +sudo mkdir -p /var/lib/gravity/site/update +sudo fallocate -l1T /var/lib/gravity/planet/share/dummy >/dev/null 2>&1 + +echo " +SCENARIO #2 ACTIVATED + +OBJECTIVE +--------- + +You will encounter an issue when trying to launch an upgrade. Your goal is +to fix the issue and successfully launch a upgrade in manual mode: + + cd $1 + sudo ./gravity upgrade --manual + +Once the upgrade has successfully launched in manual mode: + + * Manually step through the phases until the system-upgrade phase. + * Perform a rollback. +" diff --git a/networking-workshop/upgrade/lab3-fix.sh b/networking-workshop/upgrade/lab3-fix.sh new file mode 100755 index 00000000..a791d6e5 --- /dev/null +++ b/networking-workshop/upgrade/lab3-fix.sh @@ -0,0 +1,7 @@ +#!/bin/bash +echo "ROLLING BACK SCENARIO #3..." + +service=$(sudo systemctl --all | grep teleport | awk '{print $1}') +sudo systemctl start $service + +echo "SCENARIO #3 ROLLED BACK" diff --git a/networking-workshop/upgrade/lab3.sh b/networking-workshop/upgrade/lab3.sh new file mode 100755 index 00000000..899346b0 --- /dev/null +++ b/networking-workshop/upgrade/lab3.sh @@ -0,0 +1,33 @@ +#!/bin/bash + +if [ -z "$1" ]; then + echo "Please pass a path to the unpacked upgrade tarball, for example: $0 /home/ubuntu/v2" + exit 1 +fi + +echo "ACTIVATING SCENARIO #3... +" + +path="$1" +sudo $path/upload +sudo mkdir -p /var/lib/gravity/site/update +service=$(sudo systemctl --all | grep teleport | awk '{print $1}') +sudo systemctl stop $service + +echo " +SCENARIO #3 ACTIVATED. + +OBJECTIVE +--------- + +You will encounter an issue when trying to launch an upgrade. Your goal is +to fix the issue and successfully launch a upgrade in manual mode: + + cd $1 + sudo ./gravity upgrade --manual + +Once the upgrade has successfully launched in manual mode, rollback the +operation so the cluster returns to the active state: + + sudo ./gravity rollback +" diff --git a/networking-workshop/upgrade/lab4.sh b/networking-workshop/upgrade/lab4.sh new file mode 100755 index 00000000..00faef3d --- /dev/null +++ b/networking-workshop/upgrade/lab4.sh @@ -0,0 +1,30 @@ +#!/bin/bash + +if [ -z "$1" ]; then + echo "Please pass a path to the unpacked upgrade tarball, for example: $0 /home/ubuntu/v2" + exit 1 +fi + +echo "ACTIVATING SCENARIO #4... +" + +path="$1" +sudo $path/upload +sudo $path/gravity upgrade --manual +sudo $path/gravity agent shutdown >/dev/null 2>&1 + +echo " +SCENARIO #4 ACTIVATED. + +OBJECTIVE +--------- + +The upgrade has been launched in the manual mode. + +Your goal is to inspect the upgrade plan and step through a few phases. + +Once you have executed a few upgrade phases, rollback the cluster to the original state. + + cd $1 + sudo ./gravity rollback +" diff --git a/networking-workshop/upgrade/lab5.sh b/networking-workshop/upgrade/lab5.sh new file mode 100755 index 00000000..775dd0ca --- /dev/null +++ b/networking-workshop/upgrade/lab5.sh @@ -0,0 +1,42 @@ +#!/bin/bash + +if [ -z "$1" ]; then + echo "Please pass a path to the unpacked upgrade tarball, for example: $0 /home/ubuntu/v2" + exit 1 +fi + +echo "ACTIVATING SCENARIO #5... +" + +path="$1" +sudo $path/upload +sudo $path/gravity upgrade --manual +sudo $path/gravity plan execute --phase=/init >/dev/null 2>&1 +sudo $path/gravity plan execute --phase=/checks >/dev/null 2>&1 +sudo $path/gravity plan execute --phase=/pre-update >/dev/null 2>&1 +sudo $path/gravity plan execute --phase=/bootstrap >/dev/null 2>&1 +sudo $path/gravity status-reset --confirm >/dev/null 2>&1 + +echo " +SCENARIO #5 ACTIVATED. + +OBJECTIVE +--------- + +The upgrade process and requirements have now been tampered. + +A manual upgrade has been already started, you can check the plan with + +sudo ${path}/gravity plan + +and + +sudo ${path}/gravity status + +Your goal is to inspect the upgrade plan and status and fix current issues. + +Once you have executed a few upgrade phases, rollback the cluster to the original state. + + cd $1 + sudo ./gravity rollback +" diff --git a/networking-workshop/upgrade/lab6-fix.sh b/networking-workshop/upgrade/lab6-fix.sh new file mode 100755 index 00000000..9567922a --- /dev/null +++ b/networking-workshop/upgrade/lab6-fix.sh @@ -0,0 +1,8 @@ +#!/bin/bash + +echo "ROLLING BACK SCENARIO #6..." + +sudo rm /etc/resolv.conf >/dev/null 2>&1 +sudo mv /etc/resolv.conf{_original,} >/dev/null 2>&1 + +echo "SCENARIO #6 ROLLED BACK" diff --git a/networking-workshop/upgrade/lab6.sh b/networking-workshop/upgrade/lab6.sh new file mode 100755 index 00000000..cfbee33e --- /dev/null +++ b/networking-workshop/upgrade/lab6.sh @@ -0,0 +1,25 @@ +#!/bin/bash + +echo "ACTIVATING SCENARIO #6... +" + +sudo mv /etc/resolv.conf{,_original} >/dev/null 2>&1 +sudo touch /etc/resolv.conf >/dev/null 2>&1 + +echo " +SCENARIO #6 ACTIVATED. + +OBJECTIVE +--------- + +The upgrade process and requirements have now been tampered. + +You should now start with the usual upgrade process. + +Your goal is to inspect the upgrade process as it proceeds and fix issues. + +Once you have executed a few upgrade phases, rollback the cluster to the original state. + + cd $1 + sudo ./gravity rollback +" diff --git a/networking-workshop/upgrade/lab7-fix.sh b/networking-workshop/upgrade/lab7-fix.sh new file mode 100755 index 00000000..e549ccaa --- /dev/null +++ b/networking-workshop/upgrade/lab7-fix.sh @@ -0,0 +1,15 @@ +#!/bin/bash + +echo "ROLLING BACK SCENARIO #7..." + +if which chronyd > /dev/null 2>&1 +then + sudo systemctl enable --now chronyd >/dev/null 2>&1 +elif which ntpd > /dev/null 2>&1 +then + sudo systemctl enable --now ntp >/dev/null 2>&1 +else + echo "Please re-enable the system time correction daemon (NTP or similar)" +fi + +echo "SCENARIO #6 ROLLED BACK" diff --git a/networking-workshop/upgrade/lab7.sh b/networking-workshop/upgrade/lab7.sh new file mode 100755 index 00000000..44cfa517 --- /dev/null +++ b/networking-workshop/upgrade/lab7.sh @@ -0,0 +1,24 @@ +#!/bin/bash + +echo "ACTIVATING SCENARIO #7... +" + +sudo systemctl disable --now chronyd >/dev/null 2>&1 +sudo systemctl disable --now ntp >/dev/null 2>&1 +sudo date --set '2000-01-01 00:00:00 UTC' >/dev/null 2>&1 + +echo " +SCENARIO #7 ACTIVATED. + +OBJECTIVE +--------- + +The system and requirements have now been tampered. + +Your goal is to run the usual upgrade process and fix current issues. + +Once you have executed a few upgrade phases, rollback the cluster to the original state. + + cd $1 + sudo ./gravity rollback +" diff --git a/networking-workshop/upgrade/v1/app.yaml b/networking-workshop/upgrade/v1/app.yaml new file mode 100644 index 00000000..9931a232 --- /dev/null +++ b/networking-workshop/upgrade/v1/app.yaml @@ -0,0 +1,9 @@ +apiVersion: cluster.gravitational.io/v2 +kind: Cluster +baseImage: gravity:7.0.12 +metadata: + name: upgrade-demo + resourceVersion: 1.0.0 +hooks: + install: + job: file://install.yaml diff --git a/networking-workshop/upgrade/v1/charts/alpine/Chart.yaml b/networking-workshop/upgrade/v1/charts/alpine/Chart.yaml new file mode 100644 index 00000000..7775284a --- /dev/null +++ b/networking-workshop/upgrade/v1/charts/alpine/Chart.yaml @@ -0,0 +1,3 @@ +name: alpine +description: An Alpine 3.3 Linux deployment +version: 0.0.1 diff --git a/networking-workshop/upgrade/v1/charts/alpine/templates/deployment.yaml b/networking-workshop/upgrade/v1/charts/alpine/templates/deployment.yaml new file mode 100644 index 00000000..6304d457 --- /dev/null +++ b/networking-workshop/upgrade/v1/charts/alpine/templates/deployment.yaml @@ -0,0 +1,24 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: alpine + labels: + app: alpine +spec: + replicas: 1 + selector: + matchLabels: + app: alpine + strategy: + type: Recreate + template: + metadata: + labels: + app: alpine + spec: + containers: + - name: alpine + image: "{{ .Values.registry }}alpine:{{ .Values.version }}" + command: ["/bin/sleep", "90000"] + securityContext: + runAsNonRoot: false diff --git a/networking-workshop/upgrade/v1/charts/alpine/values.yaml b/networking-workshop/upgrade/v1/charts/alpine/values.yaml new file mode 100644 index 00000000..3d2b2a44 --- /dev/null +++ b/networking-workshop/upgrade/v1/charts/alpine/values.yaml @@ -0,0 +1,2 @@ +version: 3.3 +registry: diff --git a/networking-workshop/upgrade/v1/install.yaml b/networking-workshop/upgrade/v1/install.yaml new file mode 100644 index 00000000..ccf73529 --- /dev/null +++ b/networking-workshop/upgrade/v1/install.yaml @@ -0,0 +1,24 @@ +apiVersion: batch/v1 +kind: Job +metadata: + name: install +spec: + template: + metadata: + name: install + namespace: default + spec: + restartPolicy: OnFailure + containers: + - name: install + image: quay.io/gravitational/debian-tall:stretch + command: + - /usr/local/bin/helm + - install + - /var/lib/gravity/resources/charts/alpine + - --set + - registry=registry.local:5000/ + - --name + - example + - --namespace + - default diff --git a/networking-workshop/upgrade/v2/app.yaml b/networking-workshop/upgrade/v2/app.yaml new file mode 100644 index 00000000..615abd27 --- /dev/null +++ b/networking-workshop/upgrade/v2/app.yaml @@ -0,0 +1,11 @@ +apiVersion: cluster.gravitational.io/v2 +kind: Cluster +baseImage: gravity:7.0.30 +metadata: + name: upgrade-demo + resourceVersion: 2.0.0 +hooks: + install: + job: file://install.yaml + update: + job: file://upgrade.yaml diff --git a/networking-workshop/upgrade/v2/charts/alpine/Chart.yaml b/networking-workshop/upgrade/v2/charts/alpine/Chart.yaml new file mode 100644 index 00000000..66bf2775 --- /dev/null +++ b/networking-workshop/upgrade/v2/charts/alpine/Chart.yaml @@ -0,0 +1,3 @@ +name: alpine +description: An Alpine 3.4 Linux deployment +version: 0.0.2 diff --git a/networking-workshop/upgrade/v2/charts/alpine/templates/deployment.yaml b/networking-workshop/upgrade/v2/charts/alpine/templates/deployment.yaml new file mode 100644 index 00000000..6304d457 --- /dev/null +++ b/networking-workshop/upgrade/v2/charts/alpine/templates/deployment.yaml @@ -0,0 +1,24 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: alpine + labels: + app: alpine +spec: + replicas: 1 + selector: + matchLabels: + app: alpine + strategy: + type: Recreate + template: + metadata: + labels: + app: alpine + spec: + containers: + - name: alpine + image: "{{ .Values.registry }}alpine:{{ .Values.version }}" + command: ["/bin/sleep", "90000"] + securityContext: + runAsNonRoot: false diff --git a/networking-workshop/upgrade/v2/charts/alpine/values.yaml b/networking-workshop/upgrade/v2/charts/alpine/values.yaml new file mode 100644 index 00000000..ac975868 --- /dev/null +++ b/networking-workshop/upgrade/v2/charts/alpine/values.yaml @@ -0,0 +1,2 @@ +version: 3.4 +registry: diff --git a/networking-workshop/upgrade/v2/install.yaml b/networking-workshop/upgrade/v2/install.yaml new file mode 100644 index 00000000..ccf73529 --- /dev/null +++ b/networking-workshop/upgrade/v2/install.yaml @@ -0,0 +1,24 @@ +apiVersion: batch/v1 +kind: Job +metadata: + name: install +spec: + template: + metadata: + name: install + namespace: default + spec: + restartPolicy: OnFailure + containers: + - name: install + image: quay.io/gravitational/debian-tall:stretch + command: + - /usr/local/bin/helm + - install + - /var/lib/gravity/resources/charts/alpine + - --set + - registry=registry.local:5000/ + - --name + - example + - --namespace + - default diff --git a/networking-workshop/upgrade/v2/upgrade.yaml b/networking-workshop/upgrade/v2/upgrade.yaml new file mode 100644 index 00000000..4f818421 --- /dev/null +++ b/networking-workshop/upgrade/v2/upgrade.yaml @@ -0,0 +1,22 @@ +apiVersion: batch/v1 +kind: Job +metadata: + name: upgrade +spec: + template: + metadata: + name: upgrade + spec: + restartPolicy: OnFailure + containers: + - name: upgrade + image: quay.io/gravitational/debian-tall:stretch + command: + - /usr/local/bin/helm + - upgrade + - --set + - registry=registry.local:5000/ + - example + - /var/lib/gravity/resources/charts/alpine + - --namespace + - default