[Prometheus][prometheus] is a popular monitoring tool based on time series data. One of the strengths of Prometheus is its deep integration with [Kubernetes][kubernetes]. Kubernetes components provide Prometheus metrics out of the box, and Prometheus’s service discovery integrates well with dynamic deployments in Kubernetes.
There are multiple ways how to set up Prometheus in a Kubernetes cluster. There’s an official [Prometheus Docker image][promdock], so you could use that and create the Kubernetes YAML files from scratch (which according to Joe Beda is [not totally crazy][crazy]). There is also a [helm chart][helmchart]. And there is the [Prometheus Operator][promop], which is built on top of the CoreOS [operator framework][operator].
This blog post shows how to get the [Prometheus Operator][promop] up and running in a Kubernetes cluster set up with [kubeadm][kubeadm]. We use [Ansible][ansible] to automate the deployment.
[Kubeadm][kubeadm] is a basic toolkit that helps you bootstrap a simple [Kubernetes][kubernetes] cluster. It is intended as a [basis for higher-level deployment tools][kubeadm-scope], like [Ansible][ansible] playbooks. A typical Kubernetes cluster set-up with
kubeadm consists of a single Kubernetes master, which is the machine coordinating the cluster, and multiple Kubernetes nodes, which are the machines running the actual workload.
Dealing with node failure is simple: When a node fails, the master will detect the failure and re-schedule the workload to other nodes. To get back to the desired number of nodes, you can simply create a new node and add it to the cluster. In order to add a new node to an existing cluster, you first create a token on the master with
kubeadm token create, then you use that token on the new node to join the cluster with
Dealing with master failure is more complicated. Good news is: Master failure is not as bad as it sounds. The cluster and all workloads will continue running with exactly the same configuration as before the failure. Applications running in the Kubernetes cluster will still be usable. However, it will not be possible to create new deployments or to recover from node failures without the master.
This post shows how to backup and restore a Kubernetes master in a