Skip to main content
How to Run a Post-Mortem
TMThomas McClean· Engineering Manager· 7 min read
  • Retrospectives
  • Continuous improvement
  • Team management
  • Leadership
  • Meetings

How to Run a Post-Mortem

A blameless post-mortem turns a failure into learning. Here is how to run the session, identify root causes, and make sure the actions actually stick.

Something went wrong. An incident took down production for two hours. A project shipped three weeks late. A client was promised something nobody could deliver. What happens next matters enormously. Most teams do one of two things: they quietly move on without examining what happened, or they hold a blame-laden debrief where someone walks away feeling responsible for the whole thing. Neither approach makes the team better. A well-run post-mortem does both jobs that the other two approaches fail to do - it surfaces the real causes, and it turns the failure into something the team can learn from.

A post-mortem is not a tribunal. It is a structured investigation into how a system - people, processes, and tools together - produced an outcome nobody wanted. The goal is understanding, not punishment.

What a post-mortem is and is not

The term comes from medicine, where a post-mortem examination establishes cause of death. In engineering and management, it has evolved to mean any structured review of a significant failure or incident. The purpose is the same: understand what happened and why, so the same outcome can be avoided in future. For more on structuring retrospective-style conversations, see our guide on why retrospectives matter.

What distinguishes a post-mortem from a regular retrospective is focus and depth. A retro covers everything - what went well, what did not, and what to change. A post-mortem zooms in on one specific failure or incident and examines it thoroughly: the timeline, the contributing factors, the decision points where things could have gone differently, and the systemic conditions that made the failure possible.

Retrospective

Broad: the whole sprint or period
Covers wins and improvements
Light-touch on any single event
Runs regularly as standard cadence
Team-facing output

Post-mortem

Narrow: one incident or failure
Focused entirely on what went wrong
Deep-dive with timeline and root causes
Triggered by a specific event
Written report shared broadly

Post-mortems are most common in engineering and operations teams, where incidents are frequent and well-defined. But they work equally well after a project failure, a missed deadline, a client escalation, or any significant outcome that cost the team time, money, or trust. Any failure worth understanding is worth a post-mortem.

The blameless principle

The most important thing to get right is the tone, and that starts before the meeting. A blameless post-mortem operates from a core assumption: people do not come to work intending to cause failures. They make decisions based on the information and tools available to them at the time, under conditions that the organisation itself created. When something goes wrong, the question is never "who did this?" - it is "what conditions made this outcome possible?"

This is not the same as saying nobody is ever responsible for anything. Genuine negligence, repeated disregard for known risks, or deliberate shortcutting are separate issues handled through different conversations. But in the vast majority of incidents, individuals are making reasonable choices within a system that has gaps. Blaming the individual does nothing to fix the system. The next person to encounter the same conditions will make the same mistake.

The same incident, two framings

Blame framing

"Alex deployed without running the full test suite. This is unacceptable."

System framing

"The deployment process allowed a full deploy without a completed test run. Why, and how do we close that gap?"

Blameless post-mortems also produce better information. When people fear being blamed, they are selective about what they share. They edit the timeline, soften their own decisions, and avoid naming the moments where they had doubt. When they feel safe, they share everything, including the small choices and half-formed concerns that are often the most useful data. Your goal as a manager is to create that safety explicitly, before anyone walks into the room.

Who should attend

Keep the group small enough to have a real conversation, but large enough to capture all the relevant perspectives. The right attendees are anyone who was directly involved in the incident - people who made decisions, noticed early warning signs, responded to the problem, or tried to mitigate it. Include people from different functions if the incident crossed team boundaries. Avoid the instinct to invite senior leaders who were not involved; their presence changes the dynamic and usually makes the conversation less honest.

  • IncludePeople who were directly involved, responders, and anyone with relevant context about what happened. Include cross-functional participants if the incident involved multiple teams.
  • ConsiderOne or two people who were not involved but can ask fresh questions. An outside perspective often surfaces assumptions the group has stopped noticing.
  • AvoidSenior leaders who were not involved. Their presence tends to shift the conversation from learning to performance. Invite them to read the report instead.
  • Facilitate separatelyThe facilitator should not be someone with a strong stake in the outcome. The manager or lead closest to the incident often struggles to hold both roles at once.

Timing matters too. Run the post-mortem while the incident is still fresh, ideally within 48 to 72 hours for an operational incident, or within a week for a project or delivery failure. Waiting too long lets memory fade and details blur. Waiting too short means people are still in recovery mode and not ready to reflect calmly.

Running the session

A post-mortem session typically runs 60 to 90 minutes for a significant incident. Any longer and attention drifts; any shorter and you are unlikely to get past the surface-level narrative. Prepare a shared document in advance that captures the incident timeline in broad strokes. This gives everyone a common starting point and avoids spending the first thirty minutes arguing about what happened when.

Open by restating the purpose: to understand, not to assign blame. Then work through four questions in order. First, what happened - build the timeline together, adding detail and correcting any gaps in the draft. Second, why it happened - use the "five whys" approach, pushing past the immediate trigger to the underlying conditions. Third, what worked well in the response, because there are almost always things worth preserving. Fourth, what actions will prevent this from happening again.

Post-mortem session structure

Open (5 min)State purpose, confirm blameless norms, review agenda
Timeline (20 min)Build the shared timeline, correct gaps, add context
Root causes (25 min)Five whys, contributing factors, systemic conditions
What worked (10 min)Detection, response, communication wins worth keeping
Actions (15 min)Specific, owned actions to prevent recurrence
Close (5 min)Confirm report owner, sharing timeline, next review date

The facilitator's job is to keep the conversation moving through the timeline and away from solutions until the root causes are fully understood. It is tempting to jump to fixes as soon as a contributing factor surfaces. Resist this. The best fixes come from a complete picture of what happened, not from the first obvious patch.

  • Build the timeline firstSpend real time on the sequence of events. Who knew what, and when? What signals were visible but not acted on? The timeline often reveals contributing factors that nobody thought to mention.
  • Use the five whysAsk "why?" at least five times before accepting an answer as a root cause. The first why surfaces the immediate trigger. The fifth or sixth why usually surfaces the systemic condition.
  • Separate causes from fixesRecord contributing factors as you find them, but park the discussion of fixes until the full picture is clear. Fixing the first cause you find rarely prevents recurrence.
  • Capture everything in writingUse a shared document that everyone can edit during the session. Live capture means less reconstruction later and gives quiet participants a way to add context without speaking up.

Writing and sharing the report

A post-mortem that ends with a conversation and no written output is a missed opportunity. The report captures what the room agreed on, communicates the learning to people who were not there, and creates an institutional record that can be referenced when a similar pattern emerges months later. It does not need to be long. A well-written post-mortem report is rarely more than one or two pages.

The report should include the incident summary, the timeline, the root causes and contributing factors, what worked in the response, and the specific actions agreed. Each action should have a named owner and a due date. This is where using a tool like Manager Toolkit's Actions feature pays off: create each post-mortem action there, assign it, and it will surface automatically in your dashboard and in future catchups with the relevant owner. Nothing from the post-mortem disappears into a document that nobody reads again.

  • Incident summaryOne paragraph. What happened, what was impacted, how long it lasted, and when it was resolved. Enough context that someone unfamiliar with the incident can understand the report without a briefing.
  • TimelineA chronological list of key events, decisions, and signals. Use specific times. Avoid editorialising. The goal is a factual account that the group agreed on.
  • Root causesThe underlying conditions that made the incident possible. Not the immediate trigger - the systemic factors behind it. Each root cause should be a clear, specific statement.
  • What workedHow the incident was detected, what the response got right, and any communication or coordination that was effective. This section often gets skipped, but it matters.
  • ActionsSpecific, assignable actions with owners and due dates. Not vague intentions. "Improve monitoring" is not an action. "Add alerting for latency above 500ms on the checkout endpoint by Friday" is.

Share the report broadly. Post-mortems improve organisational learning when the findings reach beyond the people in the room. Share with your wider team, with relevant stakeholders, and with any other teams who might encounter similar conditions. Frame the sharing carefully: this is a document about how the system failed, not about who failed.

Closing the loop

The post-mortem report is not the end. It is the beginning of a follow-through process. The most common failure mode in post-mortems is not the session itself but what happens afterwards: the actions get agreed, written up, and then quietly forgotten as the team returns to regular work. Within a month, the report is buried in a shared drive and the conditions that caused the incident are unchanged.

Avoid this by treating post-mortem actions the same way you treat any other commitment. Assign them, give them deadlines, and review their progress in your regular one-to-ones and team retrospectives. If an action is blocked or has been deprioritised, that is worth discussing explicitly - not because someone needs to be held accountable, but because the risk that created the incident is still present until the action is done.

Some teams run a brief follow-up check two to four weeks after the post-mortem, specifically to review the status of actions and ask whether the changes made have had the intended effect. This does not need to be another full meeting. A fifteen-minute check-in or a quick async update is enough to confirm that the learnings are actually landing.

  • Assign and trackEvery post-mortem action needs an owner, a due date, and a place to track it. Actions that live only in the report document rarely get done. Create them in your team's action tracking system immediately.
  • Review in retrosInclude post-mortem actions in your regular retrospective reviews. Treat them as first-class commitments, not as optional extras. If something is consistently slipping, the team needs to decide whether to prioritise or accept the risk.
  • Check for impactWhen actions are done, verify whether they had the intended effect. Did the monitoring improvement catch something it would have missed before? Did the process change prevent the same trigger? Closing the loop means confirming the learning stuck.
  • Build a recordKeep post-mortem reports in a shared, searchable location. When a similar incident occurs later, a team that can find and read its own history learns faster than one starting from scratch every time.

Frequently asked questions

Turn post-mortem actions into real change

Assign every action from your post-mortem with an owner and due date. Track them alongside all your other team commitments.