Overview

In the previous post in this series, we established a highly available multi-master Kubernetes cluster using kubespray. At this point, we have a cluster to run our work on but no work currently running on it.

In this post we’ll explore how work can be created on the cluster and how you can tweak Kubernetes according to our applications needs. It assumes a certain level of Docker knowledge and will not go into the creation of Docker images. Instead this entry will consist solely of already available public OCI-compliant images.

Kubernetes is sometimes a self-referential beast and for that reason, I’ll define a few terms up front in their basic form that will give you a fundamental understanding of what they are and then subsequent sections will redefine them in greater detail:

Pod: Is a collection of one or more containers that are related to one another. Often referred to as the smallest unit of work in Kubernetes, a pod should be a self-sufficient component of an overall application. If the application is designed in a “cloud native” fashion then creating more pods should generally produce more capacity for the overall application.
Deployment: Is an abstract API object in Kubernetes that manages the automated creation, distribution, and deletion of the actual application pods. For instance, you may create a deployment for applicationX that creates two separate pod instances for the application then under heavy load you may scale your deployment to three which will cause Kubernetes to create a new pod.
Service: Is an API object whose job it is to take traffic from some source (internal or external) and proxy it to the appropriate pods. Services can be internal (ClusterIP, etc) or external (NodePort, LoadBalancer, etc). When a new service is created routing and firewall rules are added to the hosts which will cause traffic to be routed first to the right node and from that node to the actually running pod.

Pods

Defining “Pod” and Illustrating How to Construct Them…

Pods are groups of one or more containers that in Kubernetes represent the “smallest unit of work.” By default containers in a Kubernetes pod share a network namespace but not much else. This isolation can be selectively eroded between containers in the same pod though. For example, you can map a particular directory to both containers or you can enable shareProcessNamespace which allows any process running anywhere in the pod to see all other processes even if they’re in a different container.

What’s meant by “smallest unit of work” is that the pod should function as an independent unit with containers that provide all the functionality required for the application to run and if we run two instances of this pod without modification we should end up with two separate instances of an application able to safely and independently take in new work and process it as required.

For example let’s say you have a PHP application that must serve HTTPS to an F5 LTM application. A typical stack for that kind of application might look something like:

Web server for processing incoming HTTP requests
PHP-FPM instance for running the actual application
Database for storing persistent data.

Now if we return to our “smallest unit of work” criterion, starting multiple versions of the web server and PHP-FPM bundled together will indeed add capacity to the web application. The database however, can’t be bundled with each individual pod instance. If we were to bundle the database with the first two components and then scale up to two or three pod instances then we would run into problems where a request changes something in the database for one pod but the other pods now don’t know about it. You could add clustering to the embedded database but this could likely add way too much overhead for just this one application.

For this reason, in the above stack the web server and PHP-FPM instance would go into one pod, whereas the database would go into another. If it’s later determined to be desirable to be able to scale database load, you can re-factor and re-deploy the database component of the application to include the required clustering.

This brings us to a problem that we’ll discuss at length later in the “Services” section but bears mentioning now. If the database is running in another pod that means we can’t share filesystems with it, and we don’t know what IP address weave has assigned to the pod, how do we connect to the database?

The answer is: DNS. Kubernetes runs a basic internal DNS system for discovering services by the names they’re associated with. In the example of the database, we would first establish a database deployment which spins up a number of pods running database services and then point a ClusterIP Kubernetes service (one that’s only accessible internally to the cluster) at these pods. Once the service is created we can use DNS in our web application’s pod to locate a valid instance of the database service and return the IP address associated with that service.

Creating a Basic Pod…

OK so now that we know in greater detail what a “pod” is and what kinds of things get grouped into one, let’s start creating a basic pods. First let’s just create a single container pod to demonstrate the general syntax and I can break it down.

Populate a blank file called test-pod.yml with the following:

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  containers:
  - name: web
    image: nginx:latest

Breaking down the above:

The apiVersion specifies which portion of the Kubernetes REST API contains the definition for the object we’re attempting to define. Occasionally you’ll have to pull in supposedly beta resources (such as Deployments). However since pods were defined in version one of the API, that’s where we instruct Kubernetes to look for interpreting the fields that follow.
kind is set to Pod declaring what type of object we’re trying to create
metadata is a required section that in this case only contains the name field for specifying the name of the pod we’re creating. Pod names must be unique for their namespace and we even use them later on to target particular application instances. These two items (a name field inside a top-level metadata section) is a common pattern so it’s probably worth memorizing this bit.
The spec section contains the pod’s specification or put another way the actual definition of the pod itself.
- Since we only have the one container, we go straight to specifying which containers are in this pod, which in our case is just the one.
- We’ve named this container web but could be any valid string of characters.
- We’ve specified the docker image to use for this container as being the regular nginx image from Docker Hub.

And there you have it. Once this YAML is saved we can use kubectl create -f to create the object and then use curl to retrieve the application output once the pod enters a Running state according to kubectl get pods:

root@kube-control01:~/src/test-pod# kubectl create -f test-pod.yml
pod/test-pod created

root@kube-control01:~/src/test-pod# kubectl get pods -o wide
NAME       READY     STATUS    RESTARTS   AGE       IP          NODE            NOMINATED NODE
test-pod   1/1       Running   0          3m        10.42.0.1   kube-worker01   <none>

root@kube-control01:~/src/test-pod# curl -s http://10.42.0.1 | head
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }

Managing Deployed Pods…

OK so we now have our test-pod instance deployed, how do we interact with it? For example, let’s say we are running into some hard-to-reproduce application errors and need to get into the actual running environment to troubleshoot.

To do so you can use kubectl exec -it <podName> <pathToShell> and kubectl will automatically locate which node is running that particular pod and communicate with the kubelet running on that worker node to tunnel the input/output of a docker exec -it against the default container in the pod. In our test-pod example from before:

root@kube-control01:~/src/test-pod# kubectl exec -it test-pod /bin/bash 

root@test-pod:/# ps ax
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:00 nginx: master process nginx -g daemon off;
    7 ?        S      0:00 nginx: worker process
 2111 pts/0    Ss     0:00 /bin/bash
 2117 pts/0    R+     0:00 ps ax

root@test-pod:/# netstat -tlpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      1/nginx: master pro

As you can see from the above, once in the container, your shell will be running inside the confined container, allowing you to run whatever commands you need to run to resolve the problem.

Sharing Directories Between Containers in a Pod…

As mentioned before, containers within the same pod can share particular directories if doing so facilitates they’re interoperation.

To get this behavior, we update test-pod.yml to now read like:

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  volumes:
  - name: shared-directory
    emptyDir: {}
  containers:
  - name: web
    image: nginx:latest
    volumeMounts:
    - name: shared-directory
      mountPath: /var/tmp/shared
  - name: sleep
    image: busybox:latest
    args:
    - "sleep"
    - "99999999999"
    volumeMounts:
    - name: shared-directory
      mountPath: /var/tmp/shared

This YAML changes a few things from our previous example:

We’ve expanded the containers section to now define two containers.
- We have the same nginx container as before
- We have a new container named sleep which just runs the sleep command for approximately three millennia or until stopped.
The spec section now contains the volumes in use by this pod in addition to just the container specification that we had before.
- The volumes section specifies the volumes a pod intends to use/need. In our case we’ve instructed Kubernetes to create a temporary empty directory on the host (i.e emptyDir) and to called it shared-directory. This directory will be removed and all data lost when the pod is destroyed.
- We then create volumeMounts sections in the spec of each container that will have access to the directory.
  - The direct descendant of this field is a list of the different volumes
  - At a minimum, Kubernetes needs to know the name of the volume you’re talking about and where in the container’s filesystem hierarchy you’re wanting it to show up.

To see it in action let’s delete and re-create the pod:

tests@workhorse-pc:~$ kubectl delete -f test-pod.yml 
pod "test-pod" deleted
tests@workhorse-pc:~$ kubectl create -f test-pod.yml 
pod/test-pod created

Now that we’re at the newer version of the pod, you should be able to enter the web container via kubectl exec -it and create a file underneath /var/tmp/shared, that both containers can see.

For example:

tests@home-pc:~$ kubectl exec -it test-pod -c web /bin/bash
root@test-pod:/# echo Hello From Beyond The Wall > /var/tmp/shared/test
root@test-pod:/# exit
tests@home-pc:~$ kubectl exec -it test-pod -c sleep /bin/sh
/ # cat /var/tmp/shared/test
Hello From Beyond The Wall

This isn’t a practical use of storage, but storage is itself a larger topic and will be covered in better detail in later installments.

Passing Environmental Variables to Containers…

Occasionally you may find it useful to convey variable information to a process running in a confined container by way of the environmental variables it can see. Luckily passing hard coded environmental data is pretty easy and just requires adding a env section to the container specification for the pod. For example, let’s take the following pod manifest:

apiVersion: v1
kind: Pod
metadata:
  name: webapp
spec:
  containers:
  - name: web
    image: nginx:latest
    env:
    - name: fromKubernetes
      value: "this value"

After creating the above pod the given environmental variable should be available from inside the container:

root@kube-control01:~/src/test-pod# kubectl exec -it webapp /bin/bash

root@webapp:/# echo $fromKubernetes
this value

There are more robust ways of getting variable data into a container (such as ConfigMap and Secret objects presenting via the downward API) but the above demonstrates the core concept of modifying the environment and sets you out on the right direction. The more robust methods will be described in a later installment about configuration management.

Pod Labels…

When you have a large number of pods deployed in your cluster, you’ll eventually want to locate applications quickly according to the criteria you deem to be identifying. To do this you must add labels in the metadata section of each pod. Take these two pod manifests for example:

apiVersion: v1
kind: Pod
metadata:
  name: webapp
  labels:
    app: nginx
    tier: web
spec:
  containers:
  - name: web
    image: nginx:latest
---
apiVersion: v1
kind: Pod
metadata:
  name: api-server
  labels:
    app: nginx
    tier: api
spec:
  containers:
  - name: web
    image: nginx:latest

In the above, we’ve added a labels section to the metadata section of each pod. This new section is a simple series of (typically short) key/value pairs that describe what this pod does. Which pods get which labels is completely up to you but certain conventions exist already. Such as the app label tending towards the specific application running (as opposed to just nginx as I have above) and tier being used to describe stack placement (such as how I use web to indicate it takes user requests and api to indicate a backend application API service).

Once created, we can use the -l option to kubectl to filter for just the types of pods we’re interested in looking at:

root@kube-control01:~/src/test-pod# kubectl get pods -o wide -l app=nginx 
NAME         READY     STATUS    RESTARTS   AGE       IP          NODE            NOMINATED NODE
api-server   1/1       Running   0          37s       10.42.0.2   kube-worker01   <none>
webapp       1/1       Running   0          37s       10.42.0.1   kube-worker01   <none>

root@kube-control01:~/src/test-pod# kubectl get pods -o wide -l app=nginx,tier=web
NAME      READY     STATUS    RESTARTS   AGE       IP          NODE            NOMINATED NODE
webapp    1/1       Running   0          50s       10.42.0.1   kube-worker01   <none>

root@kube-control01:~/src/test-pod# kubectl get pods -o wide -l app=nginx,tier=api
NAME         READY     STATUS    RESTARTS   AGE       IP          NODE            NOMINATED NODE
api-server   1/1       Running   0          56s       10.42.0.2   kube-worker01   <none>

Pod Annotations…

In addition to short labeling you can also apply (typically longer form) annotations to objects. These fields can be used to provide higher level human-readable descriptions or can just contain non-identifying data that is still important information to retain on the API object itself.

Take for example this YAML spec:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    note: |
            this is a really long bit of text.
            This is a useful place to put a description.
            But anyways, yeah. Nginx or something
  name: test-pod
spec:
  containers:
  - name: web
    image: nginx:latest

After re-creating the pod we can now see our annotation in the output of describe:

tests@kube-control01:~$ kubectl describe pod test-pod | head
Name: test-pod
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: kube-compute02/192.168.122.64
Start Time: Mon, 06 Jul 2020 11:54:47 -0400
Labels: <none>
Annotations: note=this is a really long bit of text.
This is a useful place to put a description.
But anyways, yeah. Nginx or something

Services

Services are API objects which connect clients and peers to the network resources hosted on the cluster. There are a variety of ways to do this but for this article I will be concentrating mainly on ClusterIP and NodePort service type.

Exploring the Different Service Types…

Starting out you only need to be aware of three Kubernetes service types:

NodePort Dynamically allocates a random available port between 30000-32767 and any node in the cluster that receives traffic destined for this port will opaquely proxy the connection to an appropriate pod. The pod need not be running on the given node for this to work.
LoadBalancer which is a cloud-dependent service that depends on provisioners be available inside Kubernetes to dynamically allocate a new NodePort ingress and then automatically create/configure a cloud provider native load balancer object to use this port. If successful, your cloud load balancer will redirect outside traffic to the correct pods with the external ingress being listed under LoadBalancer Ingress in the output of kubectl describe service <serviceName>. This service type is dramatically easier, if your environment supports it.
ClusterIP allocates an IP address on the overlay network that will be proxied to the correct pods, creating an entry in Kubernetes’s internal DNS for discoverability. This is useful for application stacks (such as database clusters) that are only useful to pods running on the cluster. Without a service you could access a single pod by its IP on the cluster, but without the DNS entry it would be much harder to discover and there would be no native load balancing should multiple pods actually provide the given service

The decision of which service you go with depends both on the particular application running the service, its intended user base, as well as your environment. For example, as mentioned internal-only services such as log aggregation or database clusters could use ClusterIP whereas services that need to accept traffic from outside the cluster would use NodePort for environments where the given load balancer has no accepted provisioner or LoadBalancer if you’re using a known cloud provider such as Google Cloud, AWS, OpenStack, or Azure.

Regardless of which service type you go with, the end result is traffic will be routed automatically to the correct pods. Kubernetes (specifically the kube-proxy executable) does this using a the labels mechanism described earlier in the “Pods” section. When defining a service, kube-proxy locates the applicable pods using the API server and generates the right amount of iptables rules to proxy the connection (in kernel space) over the internal overlay network to the correct backend IP address and port.

Exposing a service as NodeType…

Since we need to get traffic to our nginx web server and can’t assume a load balancer provisioner will work, let’s create a NodePort service. Let’s assume we’ve created a pod similar to this:

apiVersion: v1
kind: Pod
metadata:
  name: webapp
  labels:
    app: nginx
    tier: web
spec:
  containers:
  - name: web
    image: nginx:latest

In the above, we’ve labeled this pod as both nginx (indicating a web server) application and web (indicating it takes user requests directly). Suppose this was enough to accurately and reliably isolate which pods should receive the traffic. An acceptable NodePort definition might be:

apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  type: NodePort
  ports:
    - port: 80
      nodePort: 30080
  selector:
    app: nginx
    tier: web

This service is also called nginx and uses the two pod labels given at the bottom (app=nginx and tier=web) to locate the pod running the application it’s exposing. Once it locates the target pod(s) it uses the port field to determine what TCP port on the identified pod(s) to route traffic towards. Since we’re creating a NodePort service we also hard code a port of tcp/30080 instead of letting Kubernetes pick one for us (which is default behavior for NodePort).

Once created you can use kubectl describe service to see if it was able to locate a pod. You can verify this by seeing if it has a value in the Endpoints field for the given service:

root@kube-control01:~/src/test-pod# kubectl describe service nginx
Name:                     nginx
Namespace:                default
Labels:                   <none>
Annotations:              <none>
Selector:                 app=nginx,tier=web
Type:                     NodePort
IP:                       10.101.51.14
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  30080/TCP
Endpoints:                10.42.0.1:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

Breaking down the important bits of this output:

Selector reminds us of what criteria is required for a pod to receive traffic from this service
Type reminds us that it’s a NodePort service
IP is the IP address allocated to the service in Kubernetes’s internal DNS. In this case the registered hostname is nginx (due to the service name) which resolves to an IP on the overlay network of 10.101.51.14
Port is the port on the pod containers traffic will be directed towards
NodePort is the port on the Kubernetes workers that directs traffic to this pod.

If your output looks similar to the above then your nginx pod should be accessible on tcp/30080 on every node in the cluster. For example:

tests@home-pc:~$ curl http://kube-compute01:30080 | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   612  100   612    0     0   199k      0 --:--:-- --:--:-- --:--:--  199k

Deployments

In our final section, we’ll be tying all of the above together to create a simple deployment of nginx. As described earlier deployments are higher level objects that essentially describe desired state for pods. This includes a full pod specification embedded into the deployment object.

Creating a Basic Deployment…

A basic deployment might look something like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: "test-deployment"
  labels:
   type: "custom"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: "nginx"
  template:
    metadata:
      labels:
        app: "nginx"
    spec:
      containers:
      - name: "web"
        image: "nginx:latest"

You can use kubectl explain to full explore all of the above but the important new bits are:

Our API version has changed from just v1 to apps/v1 (as determined by kubectl explain deployments).
Our deployment itself has one set of labels, in our case it’s just type=custom and is itself named simply test-deployment.
Our spec section defines the actual type of deployment we want.
- We want three replicas so that the deployment will try to have three running instances at any given point in time.
- We use matchLabels both because it’s a required attribute and because it will catch other instances of this application component created outside of this deployment and not try to spin up more pods than needed.
- Finally we include the embedded spec section which includes all the attributes as you would set on the resulting pods if you were creating them directly. Either kubectl explain pod.spec or kubectl explain deployment.spec.template.spec should give you a list of other acceptable attributes.

Adding Health Checks…

When deploying an application it’s important to remember that many components still require some amount of time to initialize and they still require some sort of regular check to see if they’re still functioning as required (and to stop directing workload at the defective components). The first step down this road is to add a health check to our deployments.

To do this we need to add a livenessProbe to the pod specification resulting in a Deployment object that looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: "test-deployment"
  labels:
   type: "custom"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: "nginx"
  template:
    metadata:
      labels:
        app: "nginx"
    spec:
      containers:
      - name: "web"
        image: "nginx:latest"
        livenessProbe:
          httpGet:
            path: "/"
            port: 80

The only new part in the above would be the livenessProbe:

This probe consists of Kubernetes performing an HTTP GET request on the given path (“/“) to whichever process is listening on port 80 on the container’s network namespace.
Without overriding any of the defaults this check is performed every 10 seconds and after three failures it will mark the container/pod as being unhealthy and attempt to restart.
You can set failureThreshold to change how many failed probes are required to determine a container is unhealthy.
You can set periodSeconds to change how frequently the probes are sent.
Optionally, to preserve log integrity you can set other attributes such as the “Host” and the “User Agent” request headers used to perform the probe. You can then filter these probes out of your logging.

And indeed, after creating the above Deployment and artificially creating a failure in one of the pods, if I check kubectl get events I can see the container/pod in question failing a liveness probe when nginx stops listening on port 80:

[....snip....]
9s  Warning  Unhealthy pod/test-deployment-86564c9c46-tl7r5  Liveness probe failed: Get http://10.40.48.1:80/: dial tcp 10.40.48.1:80: connect: connection refused
9s  Normal   Killing   pod/test-deployment-86564c9c46-tl7r5  Container web failed liveness probe, will be restarted
[....snip....]

But this only performs on-going monitoring. What about stopping a new container from receiving work if it’s still starting up? Luckily the process is exactly the same as the ongoing livenessProbe method described above. The only difference is that it’s doen using startupProbe but the probe specification is exactly the same as the livenessProbe specification. Meaning we can just copy the entire livenessProbe section, duplicate it, and then just rename the section to startupProbe and have it work as-is.

Towards the end of this series I will return to deployments and will go into more detail about managing them to meet complex workload requirements. For now though, the above should be enough to get you started.

Exploring Kubernetes, Part 2: Pods, Deployments, and Services