The first minutes of an incident are for establishing facts. Confirm the alert from a second location, identify the affected journey, and determine whether the issue is total downtime or a partial failure.
Build evidence around the experience
Assign one person to investigate and another to communicate. This prevents a crowded response channel from slowing down technical work. Record the incident start time and the first known symptom.
Check recent deployments, infrastructure events, certificate status, DNS, and third-party dependencies. Prefer reversible mitigation when the cause is still uncertain.
Keep the response actionable
After recovery, preserve the timeline. A short, factual review should produce one or two concrete improvements to monitoring, deployment safety, or operational documentation.