Posted by Rameen V. on 26 June 2016 11:52 AM
Server is Online
Network Configuration [Completed]
Full Server Restore [Completed]
Question regarding this issue can be posted on our forum thread here http://forum.innohosting.com/threads/uk66-data-recovery.89/
We are aware of accessibility issues with UK66. We're currently working to bring the server up ASAP.
Further updates will follow.
Update 1: We apologise for the delay in this. We're still waiting on the DC staff for a response.
Update 2: There were issues with the RAID card as well as 2x hard drives that have failed. We do not believe any data loss has occured as the server does have 8x drives in RAID10. We'll be initiating a file system check, after which the server should come back online.
Update 3: The FSCK is still running. There was a small delay in getting it going as the usual way of initiating it wasn't working.
Update 4: We're having to rerun the FSCK as there are a few outstanding errors on the file system that need fixing. It's currently at 6%
Update 5: FSCK is currently at 25%
Update 6: FSCK is currently at 45%
Update 7: We're still working on repairing the file system. It has gone through multiple checks but with each round it finds further errors that require fixing. We apologise for the inconvenience and we're working diligently in making sure everything is online as soon as possible.
Update 8: It's looking like we may to have to restore from our backups. Unfortunately, it does not seem the file system check will be fully complete any time soon. It has restarted itself countless times and while it restarting a few times can & does happen, this is far beyond anything we've seen previously. We'll be replacing the hard drives on the server and reinstalling the OS & control panel. We'll then begin restoring files from our remote backups.
Update 9: We have replaced all the hard drives on the server (8x in total). Backups are being transferred over as we speak. We've upgrade the server port to ensure a speedy transfer and all files are being moved over very quickly.
Update 10: The restore process is now in progress and moving quickly. As each site is restored it will appear online almost immediately.
Update 11: During the last moments of the server restore, the server experienced a catastrophic failure similar to the one that was caused in the first instance. We are no longer resorting to troubleshoot the issue but are instead replacing components that are related. This includes replacing the RAID card, any drives showing any signs of issues and all cabling. We do not expect this issue to take much longer and we firmly working to get this issue fully resolved today. We have already replaced all 8x hard drives on the server and will continue to replace any component that shows slightest bit of issue.
While UK66 has enjoyed above average uptime, often going with 100% uptime month after month, this situation is regrettable and we appreciate the patience customers have shown us so far as we work very hard in getting everything resolved without any delay. We would like to assure all customers that issues like these are extremely rare for us as we have a policy of continuously monitoring servers, including hardware components to make sure we can pre-empt such issues before they happen. Unfortunately, sometimes these issues appear without any warning and they appear suddenly - which can and does happen to any company, regardless of infrastructure. UK66 was built with a high level of resilience against such issues, capable of losing upto 4x hard drives without consequence.
We will do our best to update you as further updates emerge and progress is made, however rest assured we are extremely determined to have this server fully operational as a matter of urgency.
Update 12: The DC has swapped out the cables, the RAID card, backpane and removed all the drives again. To remove the possibility of that particular batch of drives being the issue a different batch, different capacity and different vendor of drives has been put in. The OS is being installed as we speak and we'll begin the restoration process again. The restores do not take much time and sites do come online as they're restored. We restore 4 accounts at a time to speed things up. Further updates will be provided once more progress has been made.
Update 13: The server restore has now been completed and sites should be online. If you are experiencing issues, please contact technical support and let us know the username and domain of your account.
Update 14: We're aware of a very small number of accounts that we do not have backups for. InnoHosting maintains two different sets of backups using two different backup technologies in two different locations. We have checked on this and have found that the backups that were missing in the initial set are available in the second set. The second set of backups take considerably longer to restore which is why this is not the initial option we took. For the second set of backups, a bare metal restrore is required. To prevent disruption to the accounts that are online, we will perform the bare metal restore onto a different server and then move the accounts over to UK66 for those that did not have backups in the initial set.
If your site has restored but is experiencing issues with showing content, please let technical support know with your username and domain of the accounts experiencing these issues.
Update 15: While a lot of accounts are up and running we do realise there are still a few who have some missing accounts. We wanted to speed things up as much as possible to get those files. We have been working closely with the data centre to see if we can salvage anything from the old drives, while in the meanwhile another team of staff have been diligently restoring accounts manually from our daily backups. Every attempt and every angle has now been exhausted to get this done quickly and unfortunately we had simply had a lot of bad luck and met with dead ends with every attempt.
We are waiting on the DC to set up the temporary server as mentioned in update 14 to get the BMR going. Before the backup server was tied up with the BMR we wanted to pursue other options as a BMR does take a long time. But as mentioned above, all attempts have not been successful.
I would like to stress again that we understand this uptime is having a significant effect on you and we know a lot are unhappy with the situation. Many have asked for an ETA but such is the nature of this that we just can't give an ETA on something that doesn't have an ETA. With BMR we can start giving ETA as that is a simple process. That aside, there has not been a single minute that has gone by that we have not been actively working on this from the moment it happened. It has actually been non-stop with some staff working through the nights and early morning in getting things fixed, many have not yet even had a full nights rest. We are taking this seriously, we know our customers have become used to an almost flawless service, if there was anything we could do to speed things up we would have done it.
We continue to seek your patience and understanding while we continue to work on getting the final accounts up and running.
Update 16: We have some good news. We have completed the full server restore to a temporary server. Everything seems to be intact and all missing accounts are there. We'll start moving the missing accounts over within the next 10 minutes. As each account is moved over it will appear online straight away.
Update 17: All missing sites were restored last night. Since then we have been assisting customers with any issue they have had, however looking at our ticket volumes it seems nearly everything has stabilised. Customers still experiencing issues, please get in touch with technical support asap. This outage is now considered as resolved.