Daily DevOps Interview Questions Day #55
Daily Interview Questions for SRE and DevOps engineers
Question for the day:
Since last question we did was so popular we’ll do another production troubleshooting scenario today. The goal of this exercise will be to see how you get to the root of a problem. Let me know if this type of Question/Answer/response flow is something you’d be interested in the future?
Scenario
You're on-call. The image-processor-service pods in prod are stuck in Pending. The cluster is EKS, using Karpenter to dynamically provision nodes. The app rolled out fine in staging.
You run kubectl describe pod and see:
Warning FailedScheduling 30s (x5 over 2m) default-scheduler 0/2 nodes are available: 2 node(s) had taint {workload-type: app}, that the pod didn't tolerate.
What would be the first question you ask? As we go along, each question will be responded to with logs/diagrams etc.