This is a working document to explain what is happening and then it happens. We will paste a full and professionally written incident report once we have all accounts back on line. Please refresh this page for the latest information and excuse any typos in this document
Please visit: http://bwf.co/pir for a post incident report
Update 8.27pm
The server is back on line but we are getting reports it is an old backup. We do know the cloud provider had started 2 data restores. They started the platform backup that they managed to retrieve as well as the R1Soft backup. I assume the platform backup has restored. We are now looking at syncing the data between this server and the backup server. Please bear with us.
Update 5.39pm
Our cloud provider just called. 75GB of data has copied across now from the backup. Not long to go
Update 4.50pm
Our cloud based provider just called to tell me server 19 is progressing well and will be on line soon.
Update 3.18pm
Our cloud based provider have just called me and this is the latest update as we know it. I was told that our machne was the first to be destroyed this morning and that destruction also took with it the backups. Therefore we are in a situation where the provider has rebuilt server 19 and is currently syncing all data to the home directory from the r1soft backup server to the newly built server 19. We have no access to see progress but we have been told about a 4 hour process. This is where we stand right now. The provider will call me in an hour to give me an update and I will post more information then.
Update 2.52pm
Please remember server 19 had R1Soft backup solution working on it so all data is safely hosted on the backup server in a different data centre. We are waiting for an update before looking at options with these backups.
Update: 2.51pm
We are now locked out of OnApp. Not sure why and we have a ticket in with the provider. This means we cannot use the console to gain access to the two servers not on line. We are still waiting for an update regarding server 19. We are aware we are not the only people affected as two other clients from the cloud provider have been in contact on Twitter to tell us they were affected as well and their servers 'disappeared'
Update: 2.10pm
A client has informed us two of his cloud servers are off line. We are looking into this now as a matter of urgency as we were only aware of server 19 being off line.
Update: 2.00pm
The staff at our cloud provider are restoring data at the moment.
Update: 1.35pm:
Onapp showing falsely that the servers were off line. They actually are all on line apart from server 19
Update: 1.25pm
OnApp.com virtual data centre still showing many servers being off line. This may be reported incorrectly as all servers appear to be up apart from server 19
=========================================================================
To our clients hosted on the Cloud
Please note 95% of clients are not affected by these issues today. All clients on our traditional dedicated servers hosted in DINEnoc in Orlando and Bluesquare in Maidenhead are not affected. It is only Cloud based clients that have 100% outage at this time. (**CORRECTION: OnApp GUI was reporting all servers as being off line. Only server 19 was in fact off line***)
I wanted to write this and be totally up front and honest with you regarding the problems we have had of late with the coud based hosting service. All the recent issues we have had have been with our Cloud based hosting and we have had zero issues with our traditional dedicated servers.
Approximately 6 weeks ago we entered into discussions with OnApp (onapp.com) to explore the possibility of us providing cloud based hosting. We already had bought a Virtual Data Centre with a cloud based provider to give the OnApp.com system a road test. We wanted to explore the system fully before we actually moved forward with our own cloud based server cluster. The third party provider are a reputable company. Taking this Virtual Data Centre is not unlike us renting a dedicated server and the service has been good when it works.
There have been a number of issues with this cloud virtual data centre. Whilst I am not trying to pass the buck on any of these issues as we do take responsibility for our buying decisions perhaps with the benefit if heindsight it was premature to put live clients onto this London based cloud.
The following issues have happened:
Moving Forward
Two weeks ago just provisioned the data centre we have used for 8 years and who we have a proven record with to deploy our own cloud solution and I am pleased to say this is going live on 1 July 2011. This same data centre is where we have some UK VPS servers that have 150+ days uptime with zero downtime.