Friday, May 27, 2011

What is the ROI on a DR plan?

We are all lucky to be alive. According to the US based preacher Harold Camping the world was meant to have ended on May 21st 2011. Mercifully, the divine disaster recovery plan must have kicked in. However, could we be so lucky in IT? We all know that IT systems lie at the epicentre of our businesses. They are the conduit through which our company conducts its operations. Their availability is integral to the effectiveness of the workforce. Yet, despite this dependence what is the true financial impact if they are unavailable? In other words how can you cost justify an investment in DR and BCP functionality to ensure that the effects of any malfunction in the IT environment are minimised.

This is the question that the American based research company The Aberdeen Group attempted to answer in a study it conducted last year in over 100 organisations that had a formal Disaster Recovery (DR) program. Aberdeen’s conclusion was that the cost of any business interruption event was a direct correlation to the effectiveness of the DR/BCP environment. Best in class organisations had on average fewer disruptions which were cheaper to rectify. On the other hand, laggards had more disruptions which cost their business on average nearly $3 million a year.

Aberdeen ranked organisations by the number of disruptions they recorded each year, the time it took to recover from these disruptions and how close the IT department was to meeting its SLAs with the business over data availability. Best in class businesses recorded fewer than 1 business interruptions in the last 12 months from which they required less than an hour to recover. In so doing they were able to meet 95% of their organisation’s data availability SLAs over the previous 12 months. Laggards on the other hand didn’t know or measure how well they met their business data availability measures. Perhaps this was fortuitous as they averaged more than 2 business disruptions over this period which took five hours recovery time.

Aberdeen attributed the success of the best in class businesses to a number of attributes. Firstly, there was an appreciation of the impact of any business interruption. Next these organisations had a back up and recovery strategy for each critical business element and were able to replicate their infrastructure in remote locations. In addition, DR was the responsibility of a cross-functional team who utilised a documented DR plan. This team reported to an executive champion who was incentivised to reduce downtime. Furthermore, the DR plan was updated regularly to reflect any changes in the corporate environment. Aberdeen also identified some of the tactics and tools that best in class DR businesses used. Some that are noteworthy include: the use of virtualisation; the engagement of an external consultant to provide a broader perspective to DR needs, the ability to measure the cost of any downtime and, finally, testing regularly a number of DR scenarios.

In contrasting the three groups of businesses, (i.e. best-in-class, industry average and laggards), Aberdeen identified four of the biggest differences between them. The most noticeable of these was having a senior manager accountable for DR performance. Almost every one of the best-in-class organisations had such an executive assigned compared with only 27% for the laggards. Then it was interesting to note the difference between the best performers and the rest in terms of establishing cross-functional teams with responsibility for DR. The industry average group actually trailed the laggards in this regard. However, this was still quite a bit less than half of such respondents. In contrast 78% of the best-in-class businesses had set up such a team. Other areas where these best performers were distinguished from their rivals was in the area of staff training on DR policies and the regular testing of different DR scenarios.

For those wishing to aspire to the best performers in the area of DR and BCP Aberdeen make a number of recommendations in their report. In particular, they believe it is essential to measure the financial impact of any downtime in your organisation. In many ways this reminds me of the advice an earlier Coalface speaker gave at a session last August when he stressed the importance of never wasting a crisis. These crisis’s give you the ammunition to outline the potential ROI of a business case for better DR.

Aberdeen’s other suggestions are to formalise a plan that emanates from the executive down and which is regularly updated as the business changes. They also stress the need to test this plan regularly to avoid any unexpected surprise omissions in a real life crisis and to invest in duplicate equipment and image-based backup solutions to accelerate the recovery speed.

In conclusion, one of the learnings I take away from this case study is how the IT industry has grown in its capability in the area of Disaster Recovery & BCP. For a long time it seemed that these were just terms for back up and recovery. Then in the 1990s came the concept of mirrored disks, replication and data snapshots. However, while this satisfied the needs for back-up the challenge then became how quickly could you recover from these back-ups to become operational again. As such, this need led to developments with clustered data centres running in active/passive and, more progressively, active/active mode so applications ran simultaneously in different locations. Therefore, it became easy to fall back to one of these environments if there was a problem with the other one.

The goal though for many businesses remains one of ensuring continuous availability. In such an environment there is no downtime and no user is inconvenienced and unable to do their job. Recent high profile examples in Australia highlight the IT industry is not there yet. However, we should not forget how far we have come in the DR/BCP journey in a comparatively short period of time. With this level of progress the goal of being able to ensure continuous availability is something that we are probably likely to see attainable within the next decade.

0 comments: