IR-3 is one sentence long. One sentence — and it generates more findings than almost any other control in the Incident Response family. The control says to test your incident response capability. Most agencies and contractors interpret that as “run a tabletop exercise.” Most of them run one that doesn’t actually satisfy the control.
The problem isn’t effort. It’s the gap between what teams think IR-3 asks for and what the assessor has open on their laptop when they review your evidence package. That assessor isn’t just reading 800-53. They’re cross-referencing SP 800-84, checking your exercise against your system’s FIPS 199 categorization, and looking for a thread that connects your test to your actual incident response plan — not a generic scenario somebody downloaded.
If you’re preparing for a FISMA assessment, pursuing FedRAMP authorization, or maintaining CMMC Level 2, here’s what IR-3 actually requires and what separates an exercise that passes from one that becomes a finding.
What IR-3 actually requires
The full control text:
“Test the effectiveness of the incident response capability for the system [Assignment: organization-defined frequency] using the following tests: [Assignment: organization-defined tests].”
Two assignment parameters. That’s where the trouble starts — because “organization-defined” doesn’t mean “whatever you feel like.”
Baseline applicability
IR-3 is not required for Low baseline systems. It kicks in at Moderate and High, which covers the vast majority of federal information systems and all FedRAMP-authorized cloud services. If your system processes PII, the Privacy baseline also pulls in IR-3 regardless of FIPS 199 categorization.
The enhancements that matter
IR-3(2) — Coordination with related plans. Required at Moderate and High baselines. Your incident response test can’t exist in isolation. It needs to coordinate with contingency plan testing (CP-4), continuity of operations, and any other plans that activate during a security event. If your agency runs CP-4 testing in March and IR-3 testing in October with no connection between them, an assessor will notice.
IR-3(3) — Continuous improvement. This enhancement requires you to use test results to determine effectiveness and to continuously improve incident response processes. It’s not enough to run the exercise and file the report. You need to show that findings led to changes — and that those changes showed up in subsequent tests.
The frequency trap
“Organization-defined frequency” means you pick the cadence and document it in your IR plan. But picking “every two years” and expecting an assessor to accept it is optimistic. Federal practice has settled on annual testing as the floor for Moderate and High systems. FedRAMP makes annual testing explicit. If your IR plan says annually and your last exercise was 14 months ago, you’ve defined your own requirement and then failed to meet it.
How SP 800-84 fills in the gaps
IR-3 tells you to test. SP 800-84 tells you how. This is the document most organizations skip and most assessors reference.
SP 800-84 defines a progression of exercise types, from least to most complex:
| Type | Format | Complexity | IR-3 Suitability |
|---|---|---|---|
| Seminar | Lecture or presentation | Lowest | Insufficient alone |
| Workshop | Interactive knowledge-building | Low | Insufficient alone |
| Tabletop exercise | Discussion-based scenario walkthrough | Moderate | Meets IR-3 |
| Functional exercise | Operations-based, simulated response | High | Exceeds IR-3 |
| Full-scale exercise | Real-time, multi-team simulation | Highest | Exceeds IR-3 |
A tabletop exercise is the most common way to satisfy IR-3 because it’s the lowest-complexity option that actually tests decision-making. Seminars and workshops build knowledge but don’t test capability — and that’s the word in the control text. Test the effectiveness.
SP 800-84 also specifies what a valid exercise includes: defined objectives tied to the plan being tested, a realistic scenario based on the system’s threat profile, facilitated discussion that forces decisions, and documented results that feed back into planning.
What federal assessors actually evaluate
FISMA assessors and FedRAMP 3PAOs use SP 800-53A as their assessment guide. For IR-3, they’re looking for evidence that the organization tested incident response capability using defined tests at the defined frequency. In practice, that means an artifact package.
Here’s what an assessor expects to see on the table — or more accurately, in your evidence repository:
- A dated exercise report with the specific system identified. “Enterprise IR Tabletop” doesn’t satisfy IR-3 for System X if the report doesn’t reference System X’s IR plan, System X’s threat profile, or System X’s personnel.
- A participant roster showing the right people were in the room. Not just “the security team” — the people with assigned roles in the system’s incident response plan.
- A scenario connected to the system’s risk assessment. A generic ransomware walkthrough might check a box for a commercial audit. A federal assessor wants to see that the scenario reflects the system’s FIPS 199 categorization and the threats identified in its risk assessment.
- A decision log showing what happened during the exercise. What information did participants receive? What decisions did they make? Where did they get stuck? Where did the plan break down?
- Findings and a remediation plan with owners and target dates. An exercise that surfaced zero findings isn’t evidence of a mature program — it’s evidence of a weak exercise.
- Evidence of IR-3(2) coordination — that the exercise connected to contingency plan testing and other related activities.
The assessor will trace the thread: IR plan → exercise scenario → documented decisions → findings → remediation → updated plan. If that thread breaks anywhere, the exercise happened but the control isn’t satisfied.
Common findings that fail IR-3
These come up repeatedly in FISMA and FedRAMP assessments. If any of them sound familiar, your next exercise needs to fix them.
The exercise didn’t test the plan. The most common failure. A team runs a tabletop exercise where they discuss a scenario, make some decisions, and document findings — but at no point does anyone open the actual incident response plan and test whether the procedures in it work. The exercise becomes a discussion about incident response in general, not a test of this system’s specific IR capability. From the assessor’s perspective, that’s a meeting with a theme, not a control satisfaction.
No connection to the system’s threat profile. An exercise scenario about a DDoS attack on a system whose primary risk is unauthorized data access doesn’t demonstrate much. The scenario should reflect the threats that drove the system’s risk-based decisions — the ones documented in the risk assessment that justified the security controls in the first place.
IR-3(2) was missed entirely. The exercise ran in isolation. Nobody coordinated with the team responsible for CP-4 testing. Nobody checked whether the incident response procedures align with the contingency plan’s activation triggers. This is a separate finding and an easy one to prevent — but it requires planning the exercise calendar, not just the exercise itself.
Findings went nowhere. The after-action report identified three gaps. Six months later, none of them have been addressed, no POA&M entries exist, and the IR plan hasn’t been updated. IR-3(3) requires continuous improvement. An exercise that finds problems you ignore is worse than no exercise — because now there’s documented evidence that you knew about the gaps.
Frequency was undefined or exceeded. The IR plan says “periodic.” The assessor asks what that means. Nobody has an answer. Or the plan says “annually” and the last exercise was 16 months ago. Define the frequency, document it, and hit it.
If you’re managing multiple compliance frameworks simultaneously, these failure patterns show up across all of them — but federal assessments tend to be more exacting about the thread from plan to exercise to remediation.
What a passing IR-3 exercise looks like
Flip every common finding. A passing exercise:
Maps the scenario to the system’s threat profile and FIPS 199 impact level. A High-impact system processing sensitive federal data should exercise a scenario involving unauthorized access or data exfiltration — not a generic phishing awareness walkthrough.
Explicitly references and tests the system’s IR plan. Participants should have the plan open. The exercise should force decisions that test specific procedures: escalation chains, communication protocols, evidence preservation steps, reporting timelines. If the plan says “notify the ISSO within 30 minutes,” the exercise should test whether the team can actually do that.
Includes the people with assigned IR plan roles. Not just the security team lead and whoever was available that afternoon. The ISSO, the system owner’s delegate, the forensics contact, the legal liaison — whoever the plan says is involved in response.
Produces an after-action report that closes the loop. Findings tied to specific plan deficiencies. Remediation actions with owners and dates. POA&M entries for anything that can’t be fixed immediately. And evidence that previous exercise findings were addressed — that’s the IR-3(3) continuous improvement signal.
Coordinates with related plan testing. Run the IR-3 exercise within the same quarter as CP-4 testing. Reference common activation triggers. Make sure the incident response plan and contingency plan don’t contradict each other on escalation procedures — because the assessor will check.
The documentation that results from this kind of exercise is also the documentation that satisfies SOC 2 CC7.3-7.5 and ISO 27001 A.5.24, if your organization is maintaining those frameworks alongside NIST 800-53. The specifics differ, but the core evidence — dated exercise, relevant scenario, documented decisions, remediation tracking — overlaps significantly.
Frequently asked questions
How often do you need to run tabletop exercises for NIST 800-53 IR-3?
IR-3 says “organization-defined frequency,” but federal assessors typically expect at least annual testing for Moderate and High baselines. FedRAMP requires annual testing explicitly. Define your frequency in the IR plan and stick to it — an undefined frequency is itself a finding.
Is IR-3 required for NIST 800-53 Low baseline systems?
No. IR-3 is not included in the Low baseline. It’s required for Moderate and High baselines, and for systems in the Privacy baseline. If your system is categorized as Low but processes PII, check whether the privacy overlay adds IR-3 to your control set.
What documentation does a federal assessor need for IR-3?
At minimum: a dated exercise report identifying the specific system, participant names and roles mapped to the IR plan, the scenario used and its connection to the system’s threat profile, decisions made during the exercise, gaps identified, and a remediation plan with owners and target dates. The assessor will also verify that the exercise tested your actual IR plan — not a generic scenario unrelated to the system.
Does FedRAMP have additional IR-3 requirements beyond NIST 800-53?
FedRAMP builds on the 800-53 Moderate baseline and adds specific expectations. Annual testing is explicitly required, exercises should reflect the cloud service model, and 3PAOs will verify that exercise findings feed into the POA&M process. FedRAMP also expects coordination with the CSP’s broader incident response program, not just the individual system boundary.
The thread that matters
Federal assessors aren’t checking whether you ran a tabletop exercise. They’re checking whether your incident response capability actually works — and whether you have the evidence to prove it. The exercise is how you generate that evidence. The documentation is how you present it. The remediation follow-through is how you demonstrate that testing leads to improvement.
Breachdeck runs exercises that produce the full evidence chain federal assessors look for — system-specific scenarios mapped to MITRE ATT&CK techniques, timestamped decision logs, competency scoring, and a PDF report that traces the thread from scenario to findings to remediation. No facilitator, no formatting overhead, no chasing people for their notes afterward. Run the demo — pick a scenario that matches your system’s threat profile and see the report it generates.