Final Update:  We swapped the failed drive (keeping the server online) and the array has rebuilt.  The array is now optimal:

DG Arr Row EID:Slot DID Type State BT Size PDC PI SED DS3 FSpace
0 - - - - RAID10 Optl Y 930.5 GB dflt N N dflt N
0 0 - - - RAID1 Optl Y 930.5 GB dflt N N dflt N

XEN126 is currently down and we are working on it.  A log of events is below

1 February - 12.49am
Our monitoring has alerted us that the node was down.  A reboot was issued.

1 February - 1.53am
The node did not come back on line we have asked the data centre remote hands to check the server

1 February - 2.08am
Data Centre initiated a hard reboot

1 February - 2.40am
a Remote KVM stand attached to the server is showing that 2 drives in the array are in a 'foreign' state.  Our senior technician has been called onto shift.

1 February - 3.00am
Frustratingly the remote KVM has crashed.  We have opened an urgent support ticket with the data centre to bring this back on line so we can continue to troubleshoot.

1 February - 3.05am
Drive 3 is now on line.  Drive 4 is trying to rebuild into the array.  At this time we assume Drive 4 is faulty so are going to turn it off in the disk array.

1 February - 3.011am
The node is coming back on line without Drive 4 (it is turned off in the disk array).  Drive 4 we are confident had failed causing the server to crash out.  Usually in disk arrays such an event would not result in an outage.  The virtual machines are coming back also.  We will monitor this and will arrange a drive swap first thing in the morning so there is redundancy again in the array.

Friday, February 1, 2019

« Back