Alice.

Troubles with WebFaction

Does two weeks of downtime seem even the slightest bit unreasonable? It does to me.

Disclaimer: These is my personal thoughts on the recent service disruption felt by myself and the clients of the organization I work for. This article is not the official response from the company, merely observation.

I'm not normally one to gripe about private issues online (well, other than my review of Ultraviolet, but that was a rare exception,) however this particular issue is one that struck rather close to the core of what I do professionally.

The home–based business I work for, run by my father, ran into a bit of server trouble a few weeks ago. No matter the root cause, however, the resolution of this problem has raised grave concerns over hosting arrangements. As such, most of our clients are now hosted on our own, fully self–managed, cloud hosting platform running on Amazon EC2. (Personal sites take a back burner to my/our clients', thus the absence of my site from the internet for an extended period; apologies.)

So what happened? Initially a watchdog application of mine noticed our managed dedicated server going offline, so I opened a ticket with our host. A few minutes later I had a response indicating that Webfaction had likewise noticed a watchdog failure and had asked the data center to reboot the host.

Cool. An hour of downtime, tops, I can deal with.

But then they couldn't physically locate the server, and the week went down hill from there. The average turn-around time on my ticket responses, generally questions which are fairly vital to keeping our clients informed and vital to me appeasing my boss, was 8 to 12 hours. Some questions remained unanswered for up to 36 hours.

In desperation I posted a comment on Webfaction's forums wondering at the apparent lack of support during a critical failure, and received a fairly prompt response to the active ticket effectively telling me to stop pestering them and that public discussion isn't going to help. (Regarding my forum post, a reasonable statement. Public–facing communication has to be peer–reviewed before posting, thus slowing responses even further.)

I politely removed my post on the forums and waited patiently for any of my half–dozen questions to be answered.

To cut a long story short it took two weeks before all of the clients that were running on the server were back online. A small number of them are still not 100%, generally those involving massive amounts of legacy data in ancient applications.

What was learned?

There were a number of important areas where support let us down:

Multiple methods of communication.
An online/email bridged ticketing system is wonderful, but only for non–emergencies. In the event of a true emergency situation, alternate methods of communication are vital in keeping up to date with the progress, hurdles, and problems involved in getting back up and running.
This problem is compounded by a failing ticketing system that does not register all e-mails entering the system, nor correctly notifies those involved in the ticket when there is an update.
Additionally, with shared hosting accounts, an apparent minimal level of support like this may be acceptable. With managed dedicated hosting being charged at a premium… it's unbelievable.
Access to backups.
There was no way for us to get access to server backups directly, despite the urgency of getting sites back online as soon as possible. While we were able to get simple (static) sites back online in around 48 hours, more complicated sites took far longer as we had to go back and forth repeatedly, there were silly permissions problems, etc.
Database access.
While we were able to get a complete rsync snapshot of our clients' home folders, databases were for the most part forgotten in the rush. When we did manage to get to restoring the MySQL databases, InnoDB was forgotten. It took two weeks for me to get a dump of the /var/lib/mysql raw database files, but even then they were unusable as the entire mysql database was not included—MySQL didn't know the structure for any of the tables in any of the databases. I'm still having difficulty restoring the InnoDB data on my in-house development virtual machine due to strange configuration setting mismatches.

Now that I can bring a new server online in under two minutes on Amazon EC2, and have full control of the underlying operating system, backups, et. al., we (the company I work for) are in the process of formalizing emergency response procedures to ensure nothing like this ever happens again.


A sincere thank–you has to be given to Remi Delon for his assistance beginning on the second day of this misadventure. While I may be bitching fairly loudly about how things have gone, things would have gone much worse without his diligence and support. Up until this point I have considered Webfaction the best hosting available, recommending their services to everyone I possibly could, and this is in no small part due to my past support experiences with Remi. He deserves a cake.


I will update this post as I think of other areas worthy of discussion.

Thank you all for your patience and understanding,

— Alice.

Written by on April 22, 2009 at 00:32:42, modified on April 26, 2009 at 06:12:47.

profile

Name: Alice Bevan–McGregor →アリス

Location: British Columbia, Canada

Growing up in a small town is tough when you're this strange.

View my complete profile.

E-Mail: alice@gothcandy.com

MSN: alice@gothcandy.com

AIM: GothAliceMalk

Views

Normal
View the page contents.

We having the same spirit of faith, according as it is written, I believed, and therefore have I spoken; we also believe, and therefore speak;

For which cause we faint not; but though our outward man person perish, yet the inward man person is renewed day by day.

For our light affliction, which is but for a moment, worketh for us a far more exceeding and eternal weight of glory;

While we look not at the things which are seen, but at the things which are not seen:
for the things which are seen are temporal; but the things which are not seen are eternal.

— 2 Corinthians 4