- Walkthroughs - During a walkthrough, key stakeholders in the plan meet to review the layout and contents of a plan. These aren't really "tests." They won't validate your technology or validate your recovery capabilities, but they are good exercises to familiarize stakeholders with their roles and responsibilities in the plan.
- Tabletop exercises - These rehearse a specific threat scenario. They're similar to plan walkthroughs, but suggest a pandemic, flood, hazardous material accident, or other trigger event so participants can discuss their response and recovery activities in the plan.
- Simulation - During a simulation, the DR manager invokes the plan in a controlled situation that does not impact business operations. A common approach to simulation involves the use of data replicas at the recovery site. IT professionals briefly suspend data replication between the production and recovery sites to create a replica of production data using storage or server-based snapshot/cloning technology. Then replication is resumed. The production replicas are then mounted to redundant servers at the recovery site, and applications and IT systems are recovered and restarted using the replicas. Business and application users perform functional tests on these alternate systems.
- Full test - During a full test, IT professionals perform an actual failover of IT systems and end-user processing to the recovery site. This truly tests the DR plan but is risky because it will impact production if the cutover fails. Plus, you have to successfully fail back once the test is complete. IT professionals will find that business owners are wary of scheduling and performing these types of tests, despite their inherent value.
- Test regularly - more is better! However, in order to achieve this, IT requires a solution that is non-disruptive and transparent. You don't want to take your primary applications off-line if you can avoid it. Especially if those application are business critical.
- Test using different personnel. Make sure all of your people are familiar and know their role if a problem occurs. It's also important to see if you can implement tools that support all of the platforms and applications you are running. That way training and knowledge in a crisis is less of an issue, as people will know what to do when it counts.
- Test after significant changes to the infrastructure. Even the most thorough IT organizations are bound to miss something when dealing with complex architectures found in large enterprise data centers. Ensure that nothing has been left to chance and use automation where you can.
- If your test fails, re-test to make sure you can meet your objectives. If you have the right tools in place, re-testing should be less painful and give you peace of mind knowing your organization is prepared if a real incident occurs.