10 tips for minimising or eliminating IT system downtime. 

While briefing Wolfe Systems in 2010, a client noted that Network or IT downtime could cost them as much as $100,000/ hour in lost production. The client further noted that it was an expectation that Wolfe Systems would minimise, if not eliminate, such downtime. Fortunately, we have been able to do just that. This missive outlines ten things all businesses can do to limit, if not eliminate, downtime.  

Emphasising just how much a business could lose due to IT downtime, consider these numbers:  

  • In March 2015, a 12-hour Apple store outage cost the company US$25 million.   
  • In August 2016, a five-hour IT power outage cost Delta Airlines $150 million.  
  • In March 2019, a 14-hour outage cost Facebook an estimated $90 million.   

Some might question the relevance of quoting businesses as large as Apple, Facebook, and Delta. While the losses might be proportionately less for small to medium enterprises – recent research found that data systems downtime costs the average small business 545 hours per annum. Research house Gartner found that the average cost of network downtime for businesses is $5,600 per minute. Guardian has developed a formula for calculating the cost of downtime, as follows:  

  • Cost of downtime = lost revenue + lost productivity + recovery costs + intangible costs.  

Lost revenue is calculated by – gross annual revenue / total annual business hours x percentage of revenue effected by downtime x hours of downtime. Lost productivity equals – the number of employees affected by downtime x percentage they are affected by downtime x average cost of an employee x hours of downtime. Recovery costs include – employee overtime, repair services, replacement parts, data recovery costs and supply chain delays and fees. Intangible costs can include – missed deadlines and project delays, loss of customers, loss of reputation, loss of future sales, a decline in share price and public relations recovery costs.  

Two under-reported and difficult to calculate potential impacts of downtime include those associated with: 

  • Staff
  • Data  

There is often a cost associated with diminished staff morale and confidence. Downtime does not just impact customer confidence and market reputation. Downtime also impacts the confidence that staff have in the capacity of the business to deliver and, as a result, the confidence that sales staff, in particular, have to make promises or give delivery undertakings to clients. This, in turn, will impact the capacity to secure sales and attract the best staff.  

The costs associated with the loss of data, the recovery of data and the re-sourcing of data after a period of downtime can also be substantial. Lost data can lead to a loss of reputation and a loss of sales. Recovering data can attract significant costs – especially where high-quality storage and recovery systems are not in place. The costs of re-sourcing data can also be very high, where it is possible. Where appropriate systems are not in place – data can, of course, be lost forever – and that can be very expensive indeed.  

The point here is that downtimes are all too common, even in 2021 and the costs associated with downtime for businesses of all sizes in all industries are substantial. What is more, those costs are almost always higher than many businesses recognise given the range of impacts on the business.  

This table highlights common causes of downtime and strategies for diminishing the impact of these causes.  

Cause of DT  Strategies for avoiding DT  
Human error.  Consistent, systematic, and thorough documentation and training. A training programme is essential.  
Device  

configuration error.  

Automatic update/change implementation. Test all the changes in a lab environment before putting them into effect in your system. 
Security breaches.    The 24/7 surveillance and monitoring of your systems and the implementation of a comprehensive security strategy.  
Old equipment.  Regular inventories and timely upgrade, ensuring all hardware is compatible with the cybersecurity software and operating systems.  
Software and hardware failures.  Regular, ideally monthly, maintenance and upgrading of all systems to ensure they are optimal and operating optimally. 
Power failures.  Ensure that each circuit can supply enough power by itself and provide back-up power. 
Natural disasters.   Investing in offsite fault tolerant systems and by migrating to the cloud.  
Server O/S bugs.  Ensure frequently updating of the operating system. 
Hardware/software incompatibility.  Ensure the software and hardware are kept up to date, and that all equipment is well matched and tested. 
Server hardware instability.   Monitor server stability 24/7, focusing on bugs, power supply glitches, faulty RAM, damage to hard disk platter and related issues.   
Insufficient cooling.  Ensure air conditioning is fully functional and maintaining a cool server environment.   
Internet outages.  Ensure that your internet access is secure and operating at the right speed. The speed of most systems can be increased.   

These are just some of the more common causes of network and solutions to network or IT system downtime. These are among the many issues your ‘Managed IT’ consultant should be addressing. Perhaps the cost-effective strategy for eliminating downtime is to engage a ‘Managed IT’ consultant with the skills and resources to work with the business to address all of these issues on an ongoing basis and reporting monthly, addressing among other things:  

  • Network testing and monitoring.  
  • Network security & risk mitigation.  
  • Connectivity and bandwidth.  
  • Data storage capabilities.  
  • Installations and upgrades.  
  • Implementing software patches.  
  • Providing web hosting.  
  • Network provisioning or virtualization.  
  • Performance monitoring and reporting.  
  • Providing help desk technical support.   

Anecdotal evidence suggests that the most effective strategies for minimizing downtimes for most businesses involve:  

  • Engaging a ‘Managed IT’ consultant with the skills, resources and attitude required to take responsibility for business continuity.   
  • Implementing a “Service Level Agreement’ that involves a reliability guarantee and monthly ‘Efficiency Ratings.’  

 For more information on ‘Service Level Agreements’ and ‘Efficiency Ratings’, please email me - [email protected].   

 Whether you engage a ‘Managed IT’ consultant or not – here are ten tips for minimizing or even eliminating network or IT system-related downtime in your business. To minimize downtime: 

  1. Monitor hardware and software, configurations, interactivity, power, and temperature. 
  2. Document your backup plan – incorporating multiple pathways between network locations.  
  3. Always employ the latest version of all hardware and software. 
  4. Document all processes and implement a continuous staff training programme.  
  5. Document and implement a comprehensive password and security policy. 
  6. Monitor the contribution of all hardware and software on an ongoing basis. 
  7. Test all the changes in a development environment before putting them into effect in your system. 
  8. Adopt a programme for updates – and ensure that updates are implemented. 
  9. Invest in offsite fault-tolerant systems by migrating to the cloud. 
  10. Ensure a reliable fibre optic network – and an internet set up that can maximize speeds.  

In summary, network and IT, downtime is all too common and potentially expensive. There are, however, strategies to minimize downtime and maximize productivity. For most businesses, this should involve the engagement of a ‘Managed IT’ consultant and a ‘Service Level Agreement’ that ensure all needs are fully met – on time and to budget. In the absence of such a consultant, there are strategies highlighted here that will, if fully implemented, minimize downtime.  

To learn more about minimizing downtime. CLICK HERE.  

Alternatively, give me a call. I am always happy to chat.   

Ben Latter 

0488 445 044 

[email protected] 

NEXT BLOG – ‘Achieving accelerated disaster recovery.’