Server Outage: USA Servers

  • Monday, 10th January, 2011
  • 19:28pm

Post Issue Report - From our Data Centre

The following is a statement from our data centre and explains the issue.  If you have any questions let us know and we will do out best to answer or find an answer:

 

Today our data center experienced a power service interruption that may have affected several of you. A combination of sincere efforts and major precautions were taken to avoid disruption, however, power service was unable to be sustained after an unforeseen short circuit occurred at one of our main bypass panels. We apologize for all of the difficulties this service interruption has caused you.

This morning we proceeded with the integration of our additional new UPS into our infrastructure. To accomplish this integration both of our UPS' were taken off normal operation and put into maintenance bypass. There was no service interruption anticipated as the necessary precautions had already been taken and a step by step process had been planned out.

Unfortunately, during the process we experienced an unexpected short circuit with one of the conductors in one of our maintenance bypass panels. This caused one of the upstream main service disconnects to trip resulting in a loss of power availability to half of our data center.

Immediately after the incident occurred the portion of the electrical infrastructure affected was thoroughly inspected by our electrical contractors to ensure the integrity of this distribution was not compromised. All inspections were satisfactory and within minutes our new UPS was successfully brought online under normal operation followed by our existing UPS.

Power loss to affected servers began to get restored within the hour of the outage. Our emergency standby team quickly responded and attended to troubled servers. Most servers automatically rebooted and came online once power was restored. As of 3 PM EST in the afternoon we confirmed all servers had power restored to them. Unfortunately, many did require a file system check which could have taken up to a few more hours to complete.

Nonetheless, our team has been and are continuing to thoroughly screen all affected areas and ensure all servers are up and running properly. All work involving our main electric infrastructure is officially completed and expect 100% uptime for countless years to come.

We do not take downtime lightly and want to assure you it is our daily objective to maintain the level of service we know you expect. We understand it is a privilege to be of service to you. We have worked hard to achieve your trust and confidence and will do everything to maintain it. It is our pride to be your choice. This is something we want to show you.

============================================

Update at 8am 11 January 2011

All server are back on line.  Approximately 2 hours ago there was another outage that lasted for a very short time.  And twenty minutes ago there was also a very brief outage that happened for 4 minutes.  We still have one VPS client with IP routing issues and we are waiting on someone at the data centre to resolve that issue for us.  This entire situation has been unfortunate.  We have been with our data centre partner in the USA for many years and we know many of the staff members personally now. We do trust them to resolve this and to give a full and honest explanation.  We will be seeking concrete assurances that steps are in place to prevent this happening again.

One point.  Many of you are opening support tickets to ask to move to a UK server.  We will of course honour all such requests but at this time it will be a few days as we have a growing number of requests to fulfil.  We do have servers in both the USA and the UK and moving is relatively straightforward - albeit slightly more expensive.  No UK servers were affected by this outage.

=============================================

Update at 11.18 UK Time

One server is being stubborn and is not on line.  We are aware of this.  There is nothing we can do and we are totally at the mercy of the noc staff at this time.  We have escalated this but have not received a reply as yet.   I suspect they are under pressure.  As soon as we know anything we will post it here

==============================================

Update at 8.38pm UK time

All servers except one I believe is now on line and functioning.  Still checking the VPS containers as some did not restart and were giving 'locked' errors.  I think most VPS servers are back on line at this point too.  A full incident report will be posted here as soon as we know more.

==============================================

Update at 7.15 UK Time

Server 7 and Server 12 have just come back on line.  Further updates and a full incident report will appear here when we have more news.

==============================================

Update at 6.44pm UK Time

Our data center have just sent this message:

'Within the next 30 to 45 minutes, the service on remaining servers should be restored. Our data center team is working as quickly as possible. We thank you immensely for your patience and support.'

==============================================

Update at 6.28pm UK Time

Dear Clients

Major Issue Affecting our USA Servers

At the moment we have sketchy information from our Data Centre but I have had it confirmed that there was a major Power Outage at the Data Center in Orlando Florida basically causing all servers to go down.  There are in excess of 3000 servers in the Data Centre we use in Florida and from what I can gather all have lost power.

Technicians are bringing them on line gradually.  At this time 65% of our sites are back on line.  There are still a significant number of sites down.

A full incident report will be posted here as soon as we know more.

« Back