Posts Tagged ‘sla’

Protect against an outage, make a plan

Monday, March 22nd, 2010

 

Something we always like to discuss with new customers is the necessity of backup, or more importantly a plan should the worse happen. As important as an SLA (Service Level Agreement) is as to the intentions to repair, it doesn’t guarantee against downtime. Our industry has always focused heavily on SLA’s to give customers the warm fuzzy feeling without fully explaining exactly what it is or how you are protected when you suffer an outage.

 

An SLA is just an intention to fix a service in an agreed timeframe and compensate should it fall outside of this. So basically it is a promise, and while it is good to have one it certainly won’t compensate your business sufficiently should you suffer from a fault that isn’t repaired within the given time frame. The problem is there are so many possible faults in delivering service to a site that sometimes it is just not technically feasible to repair within an agreed eta.

 

One of our carriers recently suffered from a particularly nasty fault in the Luton area. While installing some new street lighting the contractor went straight through a BT Openreach duct with a ban saw cutting all the fibre. This meant that the backhaul for five exchanges, including Luton itself was offline affecting thousands of customers. The fault was so severe that two carriers were affected by the outage.

 

Unfortunately the fault meant that some freshly laid tarmac needed to be dug up and the police would have to give special permission to close the road. This being Britain, I am sure some health & safety officials were involved as well. What this meant was an extended delay in fixing the fibre meaning the customers found themselves without service for over 72 hours.

 

However all the customers who had taken our automatic failover (Advance) or bonded products (PureFluid) didn’t suffer any outage and continued to operate as normal. This is because we ensure that delivery is always over more than one carrier. This not only minimises downtime but enables us to offer an even higher SLA….

 

For some businesses this sort of automatic failover is not always cost justified as staff can quickly switch to 3G dongles or work from home. Key message is to think about it and work out how your business could be affected by downtime and what could be done to minimise any potential impact before it happens.

Moving into the cloud

Tuesday, January 26th, 2010

 

I was speaking to a client yesterday on the merits of virtualisation and what needed to be considered before moving his company ‘into the cloud’. A lot has been said in the press and by manufacturers about the benefits of hosting services, however little covers the specific problems that you are inevitably left with. In this client’s case the business was at a junction in terms of IT investment as it evaluated its creaking server infrastructure and to plan the next steps.

 

With eight separate servers, each carrying out a specific function it was easy to recommend sharing one larger server using something like Vmware to convert each server into a virtual environment. My word of caution came with trying to share resources between servers and better instead to over specify the base machine to ensure each server got exactly what it needed. Modern virtualisation software is clever in sharing resources but in my experience if you have a Microsoft Exchange server gobbling up 4 GB of RAM then it isn’t going to share very well with a Microsoft SQL server requiring the same. Better to ensure there is at least 8 GB on the base machine and allocated 4 to each. The same goes with processors, only hard drives can really be shared, and with RAID, reliably too. So now instead of buying eight cheap servers the client can instead buy one or two (for extra redundancy) high specification servers to carry out the same role.

 

The next problem is where to host the platform. While the customer knows it should be in the cloud so that it can be accessed from everywhere and highly secure it does mean handing over parts of the business to a third-party. This can be done by fully outsourcing the virtual machines and not even owning the hardware, however without full due diligence what could appear to be a good service today, could fall over when another hundred customers have the same idea. Also no SLA will ever compensate your business should the worse happen so I believe it is better plan as though everything is about to go very wrong. Therefore a good solution would be to collocate some hardware in a datacentre but also maintain a local version and replicate between the two. That way there is maximum resilience should the internet fail, supplier go bust or office burn down.

 

While the industry will remain very positive on the concept of the cloud it is important not to loose sight of the technical challenges your company will face and how that would impact your ability to do business.

How do you backup 99.99% uptime?

Wednesday, February 18th, 2009

 

One concern I have had with the telecoms industry while running Fluidata has been around Service Level Agreements (SLA). Business customers demand that services stay up and provide their business with connectivity so that they can get on with the job of growing their company. Internet or voice loss for an hour, let alone a day, can have a serious impact on their business as pretty much all communication or transactions now rely on IP transit.

 

What seems to happen in our industry is a slapdash approach to SLAs, where they are oversold and only when the worst happens does the client understand what they actually are and what protection it offers. Let’s be clear that an SLA is a statement of a promise and nothing more. Even if an SLA states it has a 21 hour (BT) fix on a fault there are still instances I am aware of that means this can significantly overrun. One fault I remember a few years ago revolved around a FTSE company loosing service because rats (BT’s explanation not mine!) had chewed through the cable. The only issue was that a major A-road had to be closed so that a 6 foot hole could be dug to reach the stricken cable. This took 3-days. So the client under the terms of the SLA could claim for downtime as it was longer than 21 hours but that inevitably is less than the actual cost to the business.

 

Therefore I can not stress enough the necessity for disaster planning and preparing for the worst. A good quote is that nobody cares as much about your business as you do. So while an SLA shows you the confidence the provider has in its service, you need to see how they arrive at that figure – because a blanket 100% uptime guarantee is obviously complete nonsense. It brings me neatly onto our range of PureFluid and Advance products which aggregate not only multiple lines together but different technologies and uniquely, multiple last mile carriers. This means that one line could be provided by BT and the others by Cable & Wireless or Telefonica. This dramatically decreases the chance of failure and helps towards increasing the uptime.

 

Prepare for the worst and if you are someone saying well mine hasn’t gone down in years then you are probably the sort of person who does without fire extinguishers, alarms and insurance. You are dicing not only with your job but the future of your company.