bit-tech.net

123-Reg VPS customers hit by major outage

123-Reg VPS customers hit by major outage

123-Reg's virtual private server platform has been hit by a major outage, with customers being warned to restore from their own backups if they want to get back up and running.

Web-host 123-Reg is currently battling to restore service to users of its virtual private server (VPS) platform, advising users to restore from their own backups if possible.

Customers of 123-Reg's VPS platform, which allows customers to share a single physical server while all pretending that it's a dedicated box, were woken on Saturday morning by outage alerts warning that something had gone wrong. The company's status page initially pegged the problem as connectivity issues, then several hours later reported the issue as a performance problem - a common issue with VPS systems, whereby a single customer can bring the box down for all users by running resource-intensive tasks which work around restrictions on how much power a process can draw from the shared pool.

A day on, though, 123-Reg was still battling the problem - and now, two days after the initial outage, 123-Reg customers are still complaining of a lack of service while the company is dropping heavy hints that the problem may in fact be due to catastrophic data loss.

'If you are currently offline and would like to restore from your own backup to save time we can set you up a new VPS image,' the company told customers in its fourth service update over the two-day outage, before adding this morning that 'customers that have a local backup of their VPS are advised to rebuild their servers.' While the company has thus far been silent on the root cause of the problem, the length of the outage combined with strong advice to restore from backups suggests that one or more VPS servers has been hit by complete data loss - an issue which, if true, could leave anyone without their own local backup out of luck and down a website.

The timing of the outage is doubly unfortunate for 123-Reg, coming as it does hot on the heels of a hoax in which a user claimed to have accidentally deleted the data of 1,535 web host customers with a badly-written script.

3 Comments

Discuss in the forums Reply
Guinevere 18th April 2016, 12:32 Quote
Wow. I'm totally not shocked.
A bargain bucket cheapo hosting company runs into a technical issue and massive outages and uncertainties on data integrity ensue.
flibblesan 18th April 2016, 12:41 Quote
Quote:
While the company has thus far been silent on the root cause of the problem
They have been quite open about this with customers:
Quote:
Dear Customer.

I am writing to you to explain what happened to some VPS services on 16.04.16. This email is to detail what our steps have been. I am committed to open communication with all customers and would like to take this opportunity to explain in detail.

So what happened to some services? As part of a clean-up process on the 123-reg VPS platform, a script was run at 7am on 16.04.16. This script is run to show us the number of machines active against the master database. An error on the script showed 'zero-records' response from the database for some live VPS. For those customers, this created a 'failure' scenario - showing no VM's and effectively deleting what was on the host. As a result of our team's investigations, we can conclude that the issues faced having resulted in some data loss for some customers. Our teams have been and continue to work to restore. What have we done? We have been working with an extended team of experts and have left no stone unturned. Our teams have been working long into the night to restore as much as we possibly can. We have also invested in external consultants to recover, in the best way possible.

We have recovery running on the VPS servers and some are restoring to new disks. We have also begun copying recovered VPS images to new hosts and we expect some VPS to be back up and running throughout the night and in to tomorrow.

Our teams have worked for more than 24 hours and will continue to do so. No stone is being left unturned.

As the technical teams come back with updates for individual VPS we will communicate updates to customers.

For those customers with their own backup of their settings and data, if you wish to restore services yourself you can do this by issuing a reimage command through your 123 Reg control panel, this will give you a freshly installed VPS on a new cluster, where you can restore your service.

I understand that some customers may have lost some confidence in the service that we offer. So, I want to explain what we have done to prevent this happening again. We have started an audit on all cron-jobs and scripts controlling the platform, and associated architecture, so that no script will have ability to delete images, only suspend. For image deletion for those suspended over 28 days we will have a human eye to double check. A new platform will be available by the end of the year for customers which we will provide self-managed and automated snapshot backups, in addition to architecture technology to backup the whole platform, something that is not available on the current platform. I hope this goes some way to win back your confidence.

Richard Winslow,
123 Reg Brand Director
proxess 18th April 2016, 17:12 Quote
That my friends is why you do sanity checks in your scripts. If your host has a wildly different amount of clients than say an average, then you just simply don't delete them. A good practice would be to also backup these deleted machines for a week via the cleanup scripts.
Log in

You are not logged in, please login with your forum account below. If you don't already have an account please register to start contributing.



Discuss in the forums