The first ten minutes of a website incident

The first minutes of an incident are for establishing facts. Confirm the alert from a second location, identify the affected journey, and determine whether the issue is total downtime or a partial failure.

Build evidence around the experience

Assign one person to investigate and another to communicate. This prevents a crowded response channel from slowing down technical work. Record the incident start time and the first known symptom.

Check recent deployments, infrastructure events, certificate status, DNS, and third-party dependencies. Prefer reversible mitigation when the cause is still uncertain.

Keep the response actionable

After recovery, preserve the timeline. A short, factual review should produce one or two concrete improvements to monitoring, deployment safety, or operational documentation.

Build evidence around the experience

Keep the response actionable

Put the idea into practice