Debugging

These sections include common errors that you might run into developing the operator.

Errors

Pull Errors

I had a problem where the containers I wanted to deploy seemed to time-out during pull. In the pod logs I would see:

Context deadline exceeded

In this case, if you manually pull first with ssh (and then don’t ask to re-pull) you should be able to get around this.

$ minikube ssh docker pull ghcr.io/rse-ops/lammps:flux-sched-focal-v0.24.0

Service Already on Port 8080

One time I messed something up and my metrics server was still running (and I couldn’t start again) and I killed it like this:

$ kill $(lsof -t -i:8080)

Configs not taking

If your resource (config/samples) don’t seem to be taking, remember you need to apply them for changes to take effect:

$ bin/kustomize build config/samples | kubectl apply -f -

Troubleshooting

Here is a general debugging strategy. To view a resource (in the flux-operator namespace):

$ kubectl --namespace metrics-operator get pods

I found this helpful for debugging the stateful set - e.g., it was stuck on ContainerCreating:

$ kubectl describe -n metrics-operator pods

If you need to clean things up (ensuring you only have this one pod and service running first) I’ve found it easier to do:

$ kubectl delete pod --all
$ kubectl delete job --all

# Service shorthand
$ kubectl delete svc --all

# ConfigMap shorthand
$ kubectl delete cm --all

And take a look at the project Makefile that has some commands pre-set for you, and feel free to add other combinations that might be useful.


Last update: Nov 27, 2023