Daily DevOps Interview Questions Day #54
Daily Interview Questions for SRE and DevOps engineers
Question for the day:
We’re going to go through a production troubleshooting scenario today and the goal of this exercise will be to see how you get to the root of a problem. Let me know if this type of Question/Answer/response flow is something you’d be interested in the future?
Scenario
You're on-call and get paged: the service “user-profiles” is seeing high latency.
It backs the homepage and mobile app. p95/p99 SLOs are breaching, and support team is reporting slow loads and timeouts.
The backend is simple, here are the parameters
Service runs in Kubernetes on AWS, and depends on Redis (cache) and Postgres (DB).
The Kubernetes deployment has autoscaling enabled.
No recent deploys.
CPU and memory look normal at first glance.
What would be the first question you ask? As we go along, each question will be responded to with logs/diagrams etc.
Answers
There are many flows to it but I think the first question most people would ask would be:
Q: Can I get an architecture breakdown or diagram of how everything works?
A: Sure you can
Essentially the flow is:
User makes a request to load profile (e.g., homepage, app).
Service checks Redis for cached user profile:
Cache hit → return result immediately.
Cache miss → query Postgres.
Postgres returns user data.
Service writes result to Redis for future cache hits.
Return response to user.
Now that one understands the architecture, what is the next question?