Playbook

Incident Response Playbook

Step-by-step playbook for handling IT incidents in small teams.

Purpose

This incident response playbook provides a structured approach to detecting, responding to, and recovering from IT incidents. Designed for small teams where every person wears multiple hats, it focuses on clear roles and repeatable steps rather than complex frameworks.

Incident Severity Levels

  • SEV-1 (Critical): Complete system outage affecting all users. Data breach confirmed. Revenue impact > ₹1L/hr. Response within 15 minutes.
  • SEV-2 (High): Major system impairment affecting a department. Suspected security incident. Revenue impact > ₹10K/hr. Response within 30 minutes.
  • SEV-3 (Medium): Partial system impairment, single user critical issue. Non-sensitive security alert. Response within 2 hours.
  • SEV-4 (Low): Minor issue, cosmetic problem, non-critical request. Response within next business day.

Response Flow

1. Detection & Reporting

Any employee can report an incident via the helpdesk, phone, or in-person. Automated monitoring alerts (uptime checks, error rate spikes) also trigger tickets. The first responder acknowledges within the SLA time for the severity level.

2. Triage & Classification

The responding agent classifies the incident by severity and category. For SEV-1 and SEV-2, the incident commander (senior IT person available) is notified immediately. A communication channel (Slack / WhatsApp group) is opened for real-time updates.

3. Containment

Immediate action to limit damage: isolate affected systems, revoke compromised credentials, block malicious IPs, or switch to backup systems. Documentation of all containment steps begins at this stage.

4. Investigation & Root Cause Analysis

Gather logs, system snapshots, and user statements. Identify the root cause — configuration error, software bug, hardware failure, or external attack. For SEV-1 incidents, a formal post-mortem is mandatory within 48 hours.

5. Recovery & Restoration

Apply the fix, restore from backup, or deploy a workaround. Verify system functionality with automated tests and manual checks. Communicate resolution to affected users and stakeholders.

6. Post-Incident Review

Document what went well, what went wrong, and what will change. Update the playbook with new findings. Schedule follow-up actions (patch deployment, monitoring improvements, training) with owners and deadlines.

Put this into practice with workro desk.