By Matthew Hilsenrad, Director of Disaster Recovery at Abacus Group
During 2017, the Disaster Recovery (DR) team at Abacus Group introduced a new innovative approach to disaster recovery testing. Previously, we had been doing DR testing individually for each of our clients. Our new method is referred to as “Grouped Testing.” It combines failover of multiple clients simultaneously from their primary data center to their secondary site. This testing thus resembles a true outage impacting an entire data center. It’s also a better test of load and capacity in our secondary sites.
We started rolling out our new Grouped DR Testing process during 2017 by initially performing five exercises with a sample group. Each was successful and improved upon the previous. Starting in Q1 2018, we rolled out this initiative across all regions, with clients segmented by their primary data center location. Clients were and will continue to be provided multiple options for testing dates.
At part of this project, we’ve engaged 3rd party vendors to advocate speed and efficiency on behalf of our clients. With each test, we’re looking to further increase scripting and refinement of the process. While we always want to control the trigger for a DR failover, we’re looking for, and have identified, many steps ripe for automation. With this in mind, we’re able to speed up our operation and failover multiple clients in the same timeframe that our competitors would use for a single client.
During a test, we are simulating a catastrophic event where hosted servers located in a client’s primary data center are unavailable. In a timely manner, we are able to activate server replicas in the client’s secondary data center. Prior to handoff to a client or any 3rd parties, we check for server connectivity and functionality. While Abacus is heavily involved in the process, user verification is typically limited to 30 minutes or less. Once we’ve confirmed all systems are operational, we can discard the secondary copies and quickly revert to our client’s production environment.
Client interaction is a key part of the testing experience – Abacus always provides an opportunity for user testing. While we test on our clients’ behalf, when considering preparedness, we encourage participation. This is an opportunity for our clients to strengthen their Business Continuity Plan (BCP) by planning ahead and informing their users. If they are familiar with what the DR environment looks and feels like, everyone will come out ahead. We send out a series of advance notifications to our clients regarding an upcoming planned DR test so they can forward them to key stakeholders.
Our highly available email environment is another key ingredient for emergency preparedness. All email data is redundantly written to multiple servers in both our primary and secondary data centers. In the event of one or more nodes failing, a user is automatically repointed to another, live copy of their mailbox. Each year we perform at least one manual activation from a client’s primary to secondary data center. This is nearly seamless to end users, and our clients may not have even noticed the last time we performed this type of test.
To serve all of our clients from a compliance standpoint, we generate a DR Test Report. The report will be available to clients two weeks after each test in our Client Portal. Additionally, past and future DR test dates can be found in the Portal Dashboard. Finally, we ensure that our dedicated DR team is available to answer client questions and provide guidance about services covered and recommended testing procedures.
To learn more about other new initiatives our Abacus Engineering team is working on, check out this post with a sneak peak of our R&D pipeline.
These Stories on News