Daily DevOps Interview Questions Day #58
Daily Interview Questions for SRE and DevOps engineers
Question for the day:
We’re going to go through a production troubleshooting scenario today!! The goal of this exercise will be to see how you get to the root of a problem not necessarily solving it
Scenario
Your company runs a real-time ride-hailing platform (like Uber/Lyft). During peak hours, you see sporadic trip booking failures and increased latency in the booking-service API. You are the SRE on-call, and product has just reported 10% of users are seeing failures in the last 15 minutes.
Your task:
Identify what could be causing the issue.
Design a high-level architecture and mitigation plan to make the booking system resilient to sudden traffic surges and dependency slowness (e.g., payment service or maps API).
Talk through SLOs, dashboards, and how you’d respond to this incident.
First Question:
You just got paged with this alert: