Overview
In the previous post in this series, we established a highly available multi-master Kubernetes cluster using kubespray. At this point, we have a cluster to run our work on but no work currently running on it.
In this post we’ll explore how work can be created on the cluster and how you can tweak Kubernetes according to our applications needs. It assumes a certain level of Docker knowledge and will not go into the creation of Docker images. Instead this entry will consist solely of already available public OCI-compliant images.
Kubernetes is sometimes a self-referential beast and for that reason, I’ll define a few terms up front in their basic form that will give you a fundamental understanding of what they are and then subsequent sections will redefine them in greater detail:
Pod
: Is a collection of one or more containers that are related to one another. Often referred to as the smallest unit of work in Kubernetes, a pod should be a self-sufficient component of an overall application. If the application is designed in a “cloud native” fashion then creating more pods should generally produce more capacity for the overall application.Deployment
: Is an abstract API object in Kubernetes that manages the automated creation, distribution, and deletion of the actual application pods. For instance, you may create a deployment for applicationX that creates two separate pod instances for the application then under heavy load you may scale your deployment to three which will cause Kubernetes to create a new pod.Service
: Is an API object whose job it is to take traffic from some source (internal or external) and proxy it to the appropriate pods. Services can be internal (ClusterIP, etc) or external (NodePort, LoadBalancer, etc). When a new service is created routing and firewall rules are added to the hosts which will cause traffic to be routed first to the right node and from that node to the actually running pod.
Pods
Defining “Pod” and Illustrating How to Construct Them…
Pods are groups of one or more containers that in Kubernetes represent the “smallest unit of work.” By default containers in a Kubernetes pod share a network namespace but not much else. This isolation can be selectively eroded between containers in the same pod though. For example, you can map a particular directory to both containers or you can enable shareProcessNamespace
which allows any process running anywhere in the pod to see all other processes even if they’re in a different container.
What’s meant by “smallest unit of work” is that the pod should function as an independent unit with containers that provide all the functionality required for the application to run and if we run two instances of this pod without modification we should end up with two separate instances of an application able to safely and independently take in new work and process it as required.
For example let’s say you have a PHP application that must serve HTTPS to an F5 LTM application. A typical stack for that kind of application might look something like:
- Web server for processing incoming HTTP requests
- PHP-FPM instance for running the actual application
- Database for storing persistent data.
Now if we return to our “smallest unit of work” criterion, starting multiple versions of the web server and PHP-FPM bundled together will indeed add capacity to the web application. The database however, can’t be bundled with each individual pod instance. If we were to bundle the database with the first two components and then scale up to two or three pod instances then we would run into problems where a request changes something in the database for one pod but the other pods now don’t know about it. You could add clustering to the embedded database but this could likely add way too much overhead for just this one application.
For this reason, in the above stack the web server and PHP-FPM instance would go into one pod, whereas the database would go into another. If it’s later determined to be desirable to be able to scale database load, you can re-factor and re-deploy the database component of the application to include the required clustering.
This brings us to a problem that we’ll discuss at length later in the “Services” section but bears mentioning now. If the database is running in another pod that means we can’t share filesystems with it, and we don’t know what IP address weave has assigned to the pod, how do we connect to the database?
The answer is: DNS. Kubernetes runs a basic internal DNS system for discovering services by the names they’re associated with. In the example of the database, we would first establish a database deployment which spins up a number of pods running database services and then point a ClusterIP
Kubernetes service (one that’s only accessible internally to the cluster) at these pods. Once the service is created we can use DNS in our web application’s pod to locate a valid instance of the database service and return the IP address associated with that service.
Creating a Basic Pod…
OK so now that we know in greater detail what a “pod” is and what kinds of things get grouped into one, let’s start creating a basic pods. First let’s just create a single container pod to demonstrate the general syntax and I can break it down.
Populate a blank file called test-pod.yml
with the following:
apiVersion: v1 kind: Pod metadata: name: test-pod spec: containers: - name: web image: nginx:latest
Breaking down the above:
- The
apiVersion
specifies which portion of the Kubernetes REST API contains the definition for the object we’re attempting to define. Occasionally you’ll have to pull in supposedly beta resources (such as Deployments). However since pods were defined in version one of the API, that’s where we instruct Kubernetes to look for interpreting the fields that follow. kind
is set toPod
declaring what type of object we’re trying to createmetadata
is a required section that in this case only contains thename
field for specifying the name of the pod we’re creating. Pod names must be unique for their namespace and we even use them later on to target particular application instances. These two items (aname
field inside a top-levelmetadata
section) is a common pattern so it’s probably worth memorizing this bit.- The
spec
section contains the pod’s specification or put another way the actual definition of the pod itself.- Since we only have the one container, we go straight to specifying which
containers
are in this pod, which in our case is just the one. - We’ve named this container
web
but could be any valid string of characters. - We’ve specified the docker
image
to use for this container as being the regular nginx image from Docker Hub.
- Since we only have the one container, we go straight to specifying which
And there you have it. Once this YAML is saved we can use kubectl create -f
to create the object and then use curl
to retrieve the application output once the pod enters a Running
state according to kubectl get pods
:
root@kube-control01:~/src/test-pod# kubectl create -f test-pod.yml pod/test-pod created root@kube-control01:~/src/test-pod# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE test-pod 1/1 Running 0 3m 10.42.0.1 kube-worker01 <none> root@kube-control01:~/src/test-pod# curl -s http://10.42.0.1 | head <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; }
Managing Deployed Pods…
OK so we now have our test-pod
instance deployed, how do we interact with it? For example, let’s say we are running into some hard-to-reproduce application errors and need to get into the actual running environment to troubleshoot.
To do so you can use kubectl exec -it <podName> <pathToShell>
and kubectl
will automatically locate which node is running that particular pod and communicate with the kubelet
running on that worker node to tunnel the input/output of a docker exec -it
against the default container in the pod. In our test-pod
example from before:
root@kube-control01:~/src/test-pod# kubectl exec -it test-pod /bin/bash root@test-pod:/# ps ax PID TTY STAT TIME COMMAND 1 ? Ss 0:00 nginx: master process nginx -g daemon off; 7 ? S 0:00 nginx: worker process 2111 pts/0 Ss 0:00 /bin/bash 2117 pts/0 R+ 0:00 ps ax root@test-pod:/# netstat -tlpn Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 1/nginx: master pro
As you can see from the above, once in the container, your shell will be running inside the confined container, allowing you to run whatever commands you need to run to resolve the problem.
Sharing Directories Between Containers in a Pod…
As mentioned before, containers within the same pod can share particular directories if doing so facilitates they’re interoperation.
To get this behavior, we update test-pod.yml
to now read like:
apiVersion: v1 kind: Pod metadata: name: test-pod spec: volumes: - name: shared-directory emptyDir: {} containers: - name: web image: nginx:latest volumeMounts: - name: shared-directory mountPath: /var/tmp/shared - name: sleep image: busybox:latest args: - "sleep" - "99999999999" volumeMounts: - name: shared-directory mountPath: /var/tmp/shared
This YAML changes a few things from our previous example:
- We’ve expanded the
containers
section to now define two containers.- We have the same nginx container as before
- We have a new container named
sleep
which just runs thesleep
command for approximately three millennia or until stopped.
- The
spec
section now contains the volumes in use by this pod in addition to just the container specification that we had before.- The
volumes
section specifies the volumes a pod intends to use/need. In our case we’ve instructed Kubernetes to create a temporary empty directory on the host (i.eemptyDir
) and to called itshared-directory
. This directory will be removed and all data lost when the pod is destroyed. - We then create
volumeMounts
sections in the spec of each container that will have access to the directory.- The direct descendant of this field is a list of the different volumes
- At a minimum, Kubernetes needs to know the name of the volume you’re talking about and where in the container’s filesystem hierarchy you’re wanting it to show up.
- The
To see it in action let’s delete and re-create the pod:
tests@workhorse-pc:~$ kubectl delete -f test-pod.yml pod "test-pod" deleted tests@workhorse-pc:~$ kubectl create -f test-pod.yml pod/test-pod created
Now that we’re at the newer version of the pod, you should be able to enter the web container via kubectl exec -it
and create a file underneath /var/tmp/shared
, that both containers can see.
For example:
tests@home-pc:~$ kubectl exec -it test-pod -c web /bin/bash root@test-pod:/# echo Hello From Beyond The Wall > /var/tmp/shared/test root@test-pod:/# exit tests@home-pc:~$ kubectl exec -it test-pod -c sleep /bin/sh / # cat /var/tmp/shared/test Hello From Beyond The Wall
This isn’t a practical use of storage, but storage is itself a larger topic and will be covered in better detail in later installments.
Passing Environmental Variables to Containers…
Occasionally you may find it useful to convey variable information to a process running in a confined container by way of the environmental variables it can see. Luckily passing hard coded environmental data is pretty easy and just requires adding a env
section to the container specification for the pod. For example, let’s take the following pod manifest:
apiVersion: v1 kind: Pod metadata: name: webapp spec: containers: - name: web image: nginx:latest env: - name: fromKubernetes value: "this value"
After creating the above pod the given environmental variable should be available from inside the container:
root@kube-control01:~/src/test-pod# kubectl exec -it webapp /bin/bash root@webapp:/# echo $fromKubernetes this value
There are more robust ways of getting variable data into a container (such as ConfigMap
and Secret
objects presenting via the downward API) but the above demonstrates the core concept of modifying the environment and sets you out on the right direction. The more robust methods will be described in a later installment about configuration management.
Pod Labels…
When you have a large number of pods deployed in your cluster, you’ll eventually want to locate applications quickly according to the criteria you deem to be identifying. To do this you must add labels in the metadata
section of each pod. Take these two pod manifests for example:
apiVersion: v1 kind: Pod metadata: name: webapp labels: app: nginx tier: web spec: containers: - name: web image: nginx:latest --- apiVersion: v1 kind: Pod metadata: name: api-server labels: app: nginx tier: api spec: containers: - name: web image: nginx:latest
In the above, we’ve added a labels
section to the metadata
section of each pod. This new section is a simple series of (typically short) key/value pairs that describe what this pod does. Which pods get which labels is completely up to you but certain conventions exist already. Such as the app
label tending towards the specific application running (as opposed to just nginx
as I have above) and tier
being used to describe stack placement (such as how I use web
to indicate it takes user requests and api
to indicate a backend application API service).
Once created, we can use the -l
option to kubectl
to filter for just the types of pods we’re interested in looking at:
root@kube-control01:~/src/test-pod# kubectl get pods -o wide -l app=nginx NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE api-server 1/1 Running 0 37s 10.42.0.2 kube-worker01 <none> webapp 1/1 Running 0 37s 10.42.0.1 kube-worker01 <none> root@kube-control01:~/src/test-pod# kubectl get pods -o wide -l app=nginx,tier=web NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE webapp 1/1 Running 0 50s 10.42.0.1 kube-worker01 <none> root@kube-control01:~/src/test-pod# kubectl get pods -o wide -l app=nginx,tier=api NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE api-server 1/1 Running 0 56s 10.42.0.2 kube-worker01 <none>
Pod Annotations…
In addition to short labeling you can also apply (typically longer form) annotations to objects. These fields can be used to provide higher level human-readable descriptions or can just contain non-identifying data that is still important information to retain on the API object itself.
Take for example this YAML spec:
apiVersion: v1 kind: Pod metadata: annotations: note: | this is a really long bit of text. This is a useful place to put a description. But anyways, yeah. Nginx or something name: test-pod spec: containers: - name: web image: nginx:latest
After re-creating the pod we can now see our annotation in the output of describe
:
tests@kube-control01:~$ kubectl describe pod test-pod | head Name: test-pod Namespace: default Priority: 0 PriorityClassName: <none> Node: kube-compute02/192.168.122.64 Start Time: Mon, 06 Jul 2020 11:54:47 -0400 Labels: <none> Annotations: note=this is a really long bit of text. This is a useful place to put a description. But anyways, yeah. Nginx or something
Services
Services are API objects which connect clients and peers to the network resources hosted on the cluster. There are a variety of ways to do this but for this article I will be concentrating mainly on ClusterIP and NodePort service type.
Exploring the Different Service Types…
Starting out you only need to be aware of three Kubernetes service types:
NodePort
Dynamically allocates a random available port between30000-32767
and any node in the cluster that receives traffic destined for this port will opaquely proxy the connection to an appropriate pod. The pod need not be running on the given node for this to work.LoadBalancer
which is a cloud-dependent service that depends on provisioners be available inside Kubernetes to dynamically allocate a newNodePort
ingress and then automatically create/configure a cloud provider native load balancer object to use this port. If successful, your cloud load balancer will redirect outside traffic to the correct pods with the external ingress being listed underLoadBalancer Ingress
in the output ofkubectl describe service <serviceName>
. This service type is dramatically easier, if your environment supports it.ClusterIP
allocates an IP address on the overlay network that will be proxied to the correct pods, creating an entry in Kubernetes’s internal DNS for discoverability. This is useful for application stacks (such as database clusters) that are only useful to pods running on the cluster. Without a service you could access a single pod by its IP on the cluster, but without the DNS entry it would be much harder to discover and there would be no native load balancing should multiple pods actually provide the given service
The decision of which service you go with depends both on the particular application running the service, its intended user base, as well as your environment. For example, as mentioned internal-only services such as log aggregation or database clusters could use ClusterIP
whereas services that need to accept traffic from outside the cluster would use NodePort
for environments where the given load balancer has no accepted provisioner or LoadBalancer
if you’re using a known cloud provider such as Google Cloud, AWS, OpenStack, or Azure.
Regardless of which service type you go with, the end result is traffic will be routed automatically to the correct pods. Kubernetes (specifically the kube-proxy
executable) does this using a the labels
mechanism described earlier in the “Pods” section. When defining a service, kube-proxy
locates the applicable pods using the API server and generates the right amount of iptables
rules to proxy the connection (in kernel space) over the internal overlay network to the correct backend IP address and port.
Exposing a service as NodeType…
Since we need to get traffic to our nginx web server and can’t assume a load balancer provisioner will work, let’s create a NodePort service. Let’s assume we’ve created a pod similar to this:
apiVersion: v1 kind: Pod metadata: name: webapp labels: app: nginx tier: web spec: containers: - name: web image: nginx:latest
In the above, we’ve labeled this pod as both nginx
(indicating a web server) application and web
(indicating it takes user requests directly). Suppose this was enough to accurately and reliably isolate which pods should receive the traffic. An acceptable NodePort
definition might be:
apiVersion: v1 kind: Service metadata: name: nginx spec: type: NodePort ports: - port: 80 nodePort: 30080 selector: app: nginx tier: web
This service is also called nginx
and uses the two pod labels given at the bottom (app=nginx
and tier=web
) to locate the pod running the application it’s exposing. Once it locates the target pod(s) it uses the port
field to determine what TCP port on the identified pod(s) to route traffic towards. Since we’re creating a NodePort
service we also hard code a port of tcp/30080
instead of letting Kubernetes pick one for us (which is default behavior for NodePort
).
Once created you can use kubectl describe service
to see if it was able to locate a pod. You can verify this by seeing if it has a value in the Endpoints
field for the given service:
root@kube-control01:~/src/test-pod# kubectl describe service nginx Name: nginx Namespace: default Labels: <none> Annotations: <none> Selector: app=nginx,tier=web Type: NodePort IP: 10.101.51.14 Port: <unset> 80/TCP TargetPort: 80/TCP NodePort: <unset> 30080/TCP Endpoints: 10.42.0.1:80 Session Affinity: None External Traffic Policy: Cluster Events: <none>
Breaking down the important bits of this output:
Selector
reminds us of what criteria is required for a pod to receive traffic from this serviceType
reminds us that it’s aNodePort
serviceIP
is the IP address allocated to the service in Kubernetes’s internal DNS. In this case the registered hostname isnginx
(due to the service name) which resolves to an IP on the overlay network of10.101.51.14
Port
is the port on the pod containers traffic will be directed towardsNodePort
is the port on the Kubernetes workers that directs traffic to this pod.
If your output looks similar to the above then your nginx
pod should be accessible on tcp/30080
on every node in the cluster. For example:
tests@home-pc:~$ curl http://kube-compute01:30080 | head % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 612 100 612 0 0 199k 0 --:--:-- --:--:-- --:--:-- 199k
Deployments
In our final section, we’ll be tying all of the above together to create a simple deployment of nginx. As described earlier deployments are higher level objects that essentially describe desired state for pods. This includes a full pod specification embedded into the deployment object.
Creating a Basic Deployment…
A basic deployment might look something like this:
apiVersion: apps/v1 kind: Deployment metadata: name: "test-deployment" labels: type: "custom" spec: replicas: 3 selector: matchLabels: app: "nginx" template: metadata: labels: app: "nginx" spec: containers: - name: "web" image: "nginx:latest"
You can use kubectl explain
to full explore all of the above but the important new bits are:
- Our API version has changed from just
v1
toapps/v1
(as determined bykubectl explain deployments
). - Our deployment itself has one set of labels, in our case it’s just
type=custom
and is itself named simplytest-deployment
. - Our
spec
section defines the actual type of deployment we want.- We want three
replicas
so that the deployment will try to have three running instances at any given point in time. - We use
matchLabels
both because it’s a required attribute and because it will catch other instances of this application component created outside of this deployment and not try to spin up more pods than needed. - Finally we include the embedded
spec
section which includes all the attributes as you would set on the resulting pods if you were creating them directly. Eitherkubectl explain pod.spec
orkubectl explain deployment.spec.template.spec
should give you a list of other acceptable attributes.
- We want three
Adding Health Checks…
When deploying an application it’s important to remember that many components still require some amount of time to initialize and they still require some sort of regular check to see if they’re still functioning as required (and to stop directing workload at the defective components). The first step down this road is to add a health check to our deployments.
To do this we need to add a livenessProbe
to the pod specification resulting in a Deployment object that looks like this:
apiVersion: apps/v1 kind: Deployment metadata: name: "test-deployment" labels: type: "custom" spec: replicas: 3 selector: matchLabels: app: "nginx" template: metadata: labels: app: "nginx" spec: containers: - name: "web" image: "nginx:latest" livenessProbe: httpGet: path: "/" port: 80
The only new part in the above would be the livenessProbe
:
- This probe consists of Kubernetes performing an HTTP GET request on the given path (“
/
“) to whichever process is listening on port 80 on the container’s network namespace. - Without overriding any of the defaults this check is performed every 10 seconds and after three failures it will mark the container/pod as being unhealthy and attempt to restart.
- You can set
failureThreshold
to change how many failed probes are required to determine a container is unhealthy. - You can set
periodSeconds
to change how frequently the probes are sent. - Optionally, to preserve log integrity you can set other attributes such as the “Host” and the “User Agent” request headers used to perform the probe. You can then filter these probes out of your logging.
And indeed, after creating the above Deployment and artificially creating a failure in one of the pods, if I check kubectl get events I can see the container/pod in question failing a liveness probe when nginx stops listening on port 80:
[....snip....] 9s Warning Unhealthy pod/test-deployment-86564c9c46-tl7r5 Liveness probe failed: Get http://10.40.48.1:80/: dial tcp 10.40.48.1:80: connect: connection refused 9s Normal Killing pod/test-deployment-86564c9c46-tl7r5 Container web failed liveness probe, will be restarted [....snip....]
But this only performs on-going monitoring. What about stopping a new container from receiving work if it’s still starting up? Luckily the process is exactly the same as the ongoing livenessProbe
method described above. The only difference is that it’s doen using startupProbe
but the probe specification is exactly the same as the livenessProbe
specification. Meaning we can just copy the entire livenessProbe
section, duplicate it, and then just rename the section to startupProbe
and have it work as-is.
Towards the end of this series I will return to deployments and will go into more detail about managing them to meet complex workload requirements. For now though, the above should be enough to get you started.
Further Reading
- (rascaldev) Exploring Kubernetes, Part 1: Installation
- Official Documentation Deployments | Kubernetes