InnoHosting
US: 1-888-522-INNO
UK: 0800-612-8075

RSS Feed
News
Oct
26
Connectivity Issues on Server UK69
Posted by Genadi Petkov on 26 October 2017 03:27 PM

We are currently experiencing connectivity issues with our UK69 server.

 

The incident has been reported to our data center technicians and they are actively working to resolve it.

 

As soon as we have more details on the matter, we will post them here.

 

Thank you for your patience in advance!

 

Update - the server is now online and all services are back. We will continue to closely monitor it to ensure its stable performance.

 

 


Read more »



Jun
26
[17 updates] UK66 RAID Issue [RESOLVED]
Posted by Rameen V. on 26 June 2016 11:52 AM

Server is Online

Network Configuration [Completed]

Full Server Restore [Completed]

 

 

Question regarding this issue can be posted on our forum thread here http://forum.innohosting.com/threads/uk66-data-recovery.89/

We are aware of accessibility issues with UK66. We're currently working to bring the server up ASAP.

Further updates will follow.

 Update 1: We apologise for the delay in this. We're still waiting on the DC staff for a response.

 Update 2: There were issues with the RAID card as well as 2x hard drives that have failed. We do not believe any data loss has occured as the server does have 8x drives in RAID10. We'll be initiating a file system check, after which the server should come back online.

 Update 3: The FSCK is still running. There was a small delay in getting it going as the usual way of initiating it wasn't working.

Update 4: We're having to rerun the FSCK as there are a few outstanding errors on the file system that need fixing. It's currently at 6%

Update 5: FSCK is currently at 25%

Update 6: FSCK is currently at 45%

Update 7: We're still working on repairing the file system. It has gone through multiple checks but with each round it finds further errors that require fixing. We apologise for the inconvenience and we're working diligently in making sure everything is online as soon as possible.

Update 8: It's looking like we may to have to restore from our backups. Unfortunately, it does not seem the file system check will be fully complete any time soon. It has restarted itself countless times and while it restarting a few times can & does happen, this is far beyond anything we've seen previously. We'll be replacing the hard drives on the server and reinstalling the OS & control panel. We'll then begin restoring files from our remote backups.

Update 9: We have replaced all the hard drives on the server (8x in total). Backups are being transferred over as we speak. We've upgrade the server port to ensure a speedy transfer and all files are being moved over very quickly. 

Update 10: The restore process is now in progress and moving quickly. As each site is restored it will appear online almost immediately. 

Update 11: During the last moments of the server restore, the server experienced a catastrophic failure similar to the one that was caused in the first instance. We are no longer resorting to troubleshoot the issue but are instead replacing components that are related. This includes replacing the RAID card, any drives showing any signs of issues and all cabling. We do not expect this issue to take much longer and we firmly working to get this issue fully resolved today. We have already replaced all 8x hard drives on the server and will continue to replace any component that shows slightest bit of issue.

While UK66 has enjoyed above average uptime, often going with 100% uptime month after month, this situation is regrettable and we appreciate the patience customers have shown us so far as we work very hard in getting everything resolved without any delay. We would like to assure all customers that issues like these are extremely rare for us as we have a policy of continuously monitoring servers, including hardware components to make sure we can pre-empt such issues before they happen. Unfortunately, sometimes these issues appear without any warning and they appear suddenly - which can and does happen to any company, regardless of infrastructure. UK66 was built with a high level of resilience against such issues, capable of losing upto 4x hard drives without consequence. 

We will do our best to update you as further updates emerge and progress is made, however rest assured we are extremely determined to have this server fully operational as a matter of urgency.

Update 12: The DC has swapped out the cables, the RAID card, backpane and removed all the drives again. To remove the possibility of that particular batch of drives being the issue a different batch, different capacity and different vendor of drives has been put in. The OS is being installed as we speak and we'll begin the restoration process again. The restores do not take much time and sites do come online as they're restored. We restore 4 accounts at a time to speed things up. Further updates will be provided once more progress has been made.

 Update 13: The server restore has now been completed and sites should be online. If you are experiencing issues, please contact technical support and let us know the username and domain of your account.

 Update 14: We're aware of a very small number of accounts that we do not have backups for. InnoHosting maintains two different sets of backups using two different backup technologies in two different locations. We have checked on this and have found that the backups that were missing in the initial set are available in the second set. The second set of backups take considerably longer to restore which is why this is not the initial option we took. For the second set of backups, a bare metal restrore is required. To prevent disruption to the accounts that are online, we will perform the bare metal restore onto a different server and then move the accounts over to UK66 for those that did not have backups in the initial set.

If your site has restored but is experiencing issues with showing content, please let technical support know with your username and domain of the accounts experiencing these issues. 

Update 15: While a lot of accounts are up and running we do realise there are still a few who have some missing accounts. We wanted to speed things up as much as possible to get those files. We have been working closely with the data centre to see if we can salvage anything from the old drives, while in the meanwhile another team of staff have been diligently restoring accounts manually from our daily backups. Every attempt and every angle has now been exhausted to get this done quickly and unfortunately we had simply had a lot of bad luck and met with dead ends with every attempt.

We are waiting on the DC to set up the temporary server as mentioned in update 14 to get the BMR going. Before the backup server was tied up with the BMR we wanted to pursue other options as a BMR does take a long time. But as mentioned above, all attempts have not been successful.

I would like to stress again that we understand this uptime is having a significant effect on you and we know a lot are unhappy with the situation. Many have asked for an ETA but such is the nature of this that we just can't give an ETA on something that doesn't have an ETA. With BMR we can start giving ETA as that is a simple process. That aside, there has not been a single minute that has gone by that we have not been actively working on this from the moment it happened. It has actually been non-stop with some staff working through the nights and early morning in getting things fixed, many have not yet even had a full nights rest. We are taking this seriously, we know our customers have become used to an almost flawless service, if there was anything we could do to speed things up we would have done it. 

We continue to seek your patience and understanding while we continue to work on getting the final accounts up and running. 

Update 16: We have some good news. We have completed the full server restore to a temporary server. Everything seems to be intact and all missing accounts are there. We'll start moving the missing accounts over within the next 10 minutes. As each account is moved over it will appear online straight away. 

Update 17: All missing sites were restored last night. Since then we have been assisting customers with any issue they have had, however looking at our ticket volumes it seems nearly everything has stabilised. Customers still experiencing issues, please get in touch with technical support asap. This outage is now considered as resolved.

 


Read more »



Nov
20
Server72 Outage [AT RISK]
Posted by Rameen V. on 20 November 2015 03:52 PM

We're currently investigating an outage on this server.

 Update 1: The server is now back online (total downtime was 4mins). We'll be performing non-service impacting maintenance on this server. During this period, the service is considered at risk.

 

 


Read more »



Nov
16
UK69 RAID Rebuild
Posted by Rameen V. on 16 November 2015 12:39 PM

Dear All,

We have replaced a failed drive on UK69 and the RAID is currently rebuilding. During the rebuild websites may load slower than usual.

 

 


Read more »



Nov
11
Server 70 DDoS [MITIGATED]
Posted by Rameen V. on 11 November 2015 01:22 AM

 

Server 70 is currently experiencing a DDoS attack. We're currently working on mitigating this.

Update 1: Our DDoS mitigation appliance has now been deployed in front of this server and websites are now functioning normally.

We'll continue to keep a close eye on this.


Read more »



Nov
3
[23 UPDATES] Server 72 [IN PROGRESS]
Posted by Rameen V. on 03 November 2015 11:49 PM

Current Status @ 17:43 GMT

Server is Online

Network Configuration [Completed]
 
Server Restore [Completed]

 

 

Server 72 has suffered a catastrophic failure.

We are working on bringing this online as a priority and will be providing updates as progress is made.

 

Update 01.44AM GMT: We are still working on this. We do not currently have an ETA.

Update 02.30AM GMT: We have not been able to diagnose the hardware fault with the current server. The issue is that the server is experiencing a kernel panic on all kernels we have tried on the server including a livecd. Instead of spending more time in diagnosing this issue further, we're building and deploying a new server and will then begin to restore from our latest backups.

Update 14:06 GMT: We're still working on the restore.. The newly built server exhibited similar issues. We have had to build a different one and are in the middle of performing the restore.

Update 15:44 GMT: The restore is still under way. According to the backup system, it will take around 7-10hrs more for it to complete. If the restore is successful, all websites should be back online shortly after that.

Update 16:47 GMT: Restore progress: 55GB out of 736GB Restored (7.47%)

Update 17:44 GMT: Restore progress: 130GB out of 736GB Restored (17.66%)

Update 21:17 GMT: Due to an error with the restore process, the restore has been temporarily paused. We are continuing to work on this as a top priority. More update to follow.

Update 22:32 GMT: Even though the previous drives were new, they started to exhibit issues which is why the restore was stopped (prematurely by the server).. We've replaced the drives and have started the restore.

Update 17:10 GMT: The server restore has completed. We're currently reconfiguring the network. This is not expected to take very long.

Update 17:44 GMT: We should almost be ready to go. We are just waiting on the on-site techs to switch the network cables around

Update 17:48 GMT: The server is now online. Further maintenance on this server will still be conducted but this should not non service impacting. A full RFO will be emailed to all customers.


Read more »