XEN10 Node Degraded RAID Array

  • Wednesday, 14th June, 2017
  • 19:07pm
Update 9.00am
The array is now optimal and the reconstruct has been completed.

Unit 1 (Raid10, 1142.8 GB): Optimal 
Physical Disk 1:0 (ST3600057SS, 6SL5QHBL0000N3273QN7): Ok 
Physical Disk 1:1 (ST3600057SS, 6SL5MPJ10000B32501MA): Ok 
Physical Disk 1:2 (ST3600057SS, 6SL5QF1Q0000B31804VP): Ok 
Physical Disk 1:3 (ST3600057SS, 6SL5MN070000N3261AN0): Ok

Update 2.12am
The final Virtual Machine is now on line. The array continues to reconstruct in the background.

Update:  1.28am

The node is rebooted.  All Virtual Machines are back bar one.  We're working on the one that did not reboot.  The RAID array is reconstructing now with high priority we're monitoing server performance and there is zero impact to clients:

Unit 1 (Raid10, 1142.8 GB): Degraded - Reconstruct 
Physical Disk 1:0 (ST3600057SS, 6SL5QHBL0000N3273QN7): Warning 
Physical Disk 1:1 (ST3600057SS, 6SL5MPJ10000B32501MA): Ok 
Physical Disk 1:2 (ST3600057SS, 6SL5QF1Q0000B31804VP): Ok 
Physical Disk 1:3 (ST3600057SS, 6SL5MN070000N3261AN0): Ok

Issue
Our monitoring system alerted us to the fact that one drive in the RAID array of XEN10 node has failed and it needs to be replaced.  Although this server is a Hot Swap chassis it was ordered in the same batch as another server we discovered 6 months ago had a wrongly wired backplace (human error at build time)  resulting in nothing but luck preventing data loss as the good drive we pulled was from another RAID pair.

Although it's extremely unlikely this server has a wrongly wired backplace as well due to it being deployed at the same time as the other node we're going to pull XEN10 offline to check the RAID BIOS before replacing the bad drive as a precaution.  This will result in a short downtime for all Virtual Machines on this node.  

We're going to complete this maintenance on Friday 16 June 2017 at 1am UK time when it will cause least distuption for clients.  All VMs will be safely shut down before we shut down the node itself to replace the drive.  Once the node is booted up again the array will start rebuilding into the new drive and Virtual Machines will start at the same time. As this is a low use node we are not expecting much of an impact or slowdown from the RAID rebuild for the active Virtual Machines.

Please submit a support ticket for the attention of our System Administrators should you have any additional questions or concerns.
« Back