Yesterday I wrote about writing a Disaster Recovery (DR)/Business Continuity (BC) Plan, whichever you wish to call it; and the things that should be covered therein. So we learned to “Talk the talk”, now let’s “Walk the walk”. That is done in two words: TEST IT. As I stated yesterday, your plan is absolutely worthless unless you know it works; and the way you prove that it works is to test it.
First of all, let me take some of the myths out of DR testing:
- Testing requires shutdown of production systems
- Testing is a linear process and hangs if expected results are not achieved
- Testing takes resources away from real work
- Testing reveals mistakes in planning, compromises management backing and costs you your reputation and possibly your job!
Testing can be done in parallel with production systems running. If your storage strategy is redundant cold servers, then fire those bad boys up and test how fast you can load the data and get them available to support the business. If your strategy was replacement, then get the replacement servers in, maybe you can rent them to reduce the cost, and test whether you have all the software, including operating system; and data available to make them production ready.
I support objective testing. Take a subset of objectives from the DR plan and test only those objectives. Make them non-interdependent objectives so if one test fails, the following tests can still be conducted. Tests can be scheduled on a regular basis throughout the year on different sets of objectives, with all objectives tested by year end.
Disaster recovery testing is real work. It is part of the job of the IT department to ensure they are ready to respond to any disaster from a virus to a natural disaster that levels the building. How many times have you practiced fire drills and tornado drills at work? That is a company showing social responsibility, and DR testing is no different.
Testing should not be the end of the process and a test that did not achieve expected results should not be viewed as a failure. This is an opportunity to make the DR/BC Plan better. There are no failures, just opportunities for improvement. Remember that the objective of testing is to ensure the continuity of business, safety of your people and restoration of your data in the face of a disaster. A failed test shows you what part of the plan did not work, so you can improve that part to ensure success in an actual crisis.
So how do you execute an objectives disaster recovery test? It is a process:
- Dissemination of the scenario
- Declare the disaster
- Implement mock recovery activities
- Monitor the progress
- Document the work performed, timing and problems encountered
- Orderly completion of the test and documentation of the results
- Immediate analysis of the results
Disseminate the scenario. No surprise testing, tell the participants of the test the scenario of the test. What objectives are you testing and how are you going to conduct the test.
Declare the disaster. Inform the whole organization that disaster recovery testing is being performed. At time to start the test, declare the disaster has occurred and kick off the testing procedures.
Implement mock recovery activities. Any activities that cannot actually be performed can be mocked. If you are not bringing in actual replacement servers, you can actually send someone to retrieve off-site data backups. This will verify they know where to go to retrieve them and that they have access to them. There is nothing worse than to have one or two names on an access list to your off-site storage and those one or two individuals not be available.
Monitor and document. Monitor activities, verify everyone knows their duties and performs them in proper order, if needed. Do not allow auditors to interfere or interact in the process. Do not allow by-standers. Hand them a clipboard and have them document activities. Document the work that is performed, timing of such and the results of the work, including any issues that resulted.
Orderly completion of the test and immediate analysis. Call an end to the test. Gather the participants in a conference room and immediately document and analyze the results. Do not allow people to rethink steps, or think what they should have done. Do not allow for lapse of memory before documenting the results. Analyze what actually happened during the test. Document the results of the test for management.
Now go back to the Disaster Recovery Plan and see if it needs revised due to the results of the test. Then begin preparation of your next test. Remember to test only a subset of objectives. Test regularly and on a schedule, having all objectives tested annually. Testing should occur after any significant change in business or infrastructure; and make allowances for re-testing in your schedule.
Keep in mind that an effective Disaster Recovery Plan is not just an IT solution, it is an IT business solution to an issue that every business faces. Only 50% of organizations even have a Disaster Recovery/Business Continuity Plan. Of those, fewer that 50% do any kind of testing of the plan, which in effect is like having no plan at all. Today, with growing organizational dependence on information technology strategy, systems and TI solutions; the potentially devastating impact of man-made and natural disasters on business processes and finances, and employee well being is now greater than ever. It is estimated that up to 80 percent of companies without a well-conceived and tested business continuity plan go out of business within two years of a major disaster. Make sure this doesn’t happen to your business.