PDA

View Full Version : ROOT III server down and 1and1 is hanging up on me


eWebtricity
04-09-2005, 10:22 PM
At 21:14 tonight, one of our largest clients ROOT III server went down. After resetting the server in recovery mode and issuing a

fdisk -l

at the command prompt, we get nothing.

df

gives us this

rescue:/var/log# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/ram0 136956 127185 9771 93% /
tmpfs 512796 0 512796 0% /dev/shm
rescue:/var/log#


So I called 1and1 support and i just get a auto-attendant that states their call volume is unusally high and I should call back later. Then it disconnects me. I thought it was a coincidence but I called back 5 times now and I got disconnected every time. I'd rather hold for someone than get disconnected during a critical outage.

I've had this happen on one of our servers before and someone in the 1and1 helpdesk knew what was wrong right away. Then they fixed it in about 10 minutes and the server was back online.

I asked what it was but i got the run around and nobody ever really told me what the fix was and we just gave up asking.

Wish i knew right now.

eWebtricity
04-09-2005, 10:23 PM
Just got through, now i'm on hold. I guess sixth time is the charm.

eWebtricity
04-09-2005, 10:25 PM
While on hold, i found this in the logs of the recovery mode


loop: loaded (max 8 devices)
Compaq SMART2 Driver (v 2.6.0)
Intel(R) PRO/1000 Network Driver - version 5.3.19-k2
Copyright (c) 1999-2004 Intel Corporation.
e100: Intel(R) PRO/100 Network Driver, 3.0.27-k2-NAPI
e100: Copyright(c) 1999-2004 Intel Corporation
via-rhine.c:v1.10-LK1.2.0-2.6 June-10-2004 Written by Donald Becker
ACPI: PCI interrupt 0000:00:12.0[A] -> GSI 23 (level, low) -> IRQ 23
eth0: VIA Rhine II at 0xe000, 00:40:ca:82:3b:84, IRQ 23.
eth0: MII PHY found at address 1, status 0x786d advertising 05e1 Link 41e1.
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 0000:00:11.1
ACPI: PCI interrupt 0000:00:11.1[A] -> GSI 20 (level, low) -> IRQ 20
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt8235 (rev 00) IDE UDMA133 controller on pci0000:00:11.1
ide0: BM-DMA at 0xdc00-0xdc07, BIOS settings: hda:pio, hdb:pio
Probing IDE interface ide0...
#
# What's this mean ?
#
ide0: Wait for ready failed before probe !
#
#
Loading Adaptec I2O RAID: Version 2.4 Build 5go
Detecting Adaptec I2O RAID controllers...
Red Hat/Adaptec aacraid driver (1.1.2-lk2 Dec 19 2004)
3ware Storage Controller device driver for Linux v1.26.00.039.
3w-xxxx: No cards found.
3ware 9000 Storage Controller device driver for Linux v2.26.02.001.
libata version 1.02 loaded.
mice: PS/2 mouse device common for all mice


Not sure what that means yet, it obviously means the drive is having a problem. But why?

eWebtricity
04-09-2005, 10:35 PM
got somebody on the line, that wasn't too bad. Not sure why they thought their call volume was so high.

eWebtricity
04-09-2005, 10:49 PM
support is looking at it and apparently rebooted it, i got knocked off the console.

1and1 came back and said he can't see the hard drive either and would have to cut a ticket up to their hardware support group and it would take anywhere from 20 minutes to 4 hours for a response. I asked for a ticket number and he gave it to me.

His name was Bryan, he was pretty quick and courteous.

eWebtricity
04-10-2005, 01:23 AM
I called in and checked up on the status since the server is still down and i'm rolling up on 3 hours wait.

I was put on hold and I don't think I've ever heard worse hold music. It was a midi file that was looped. it was pure torture.

...figures they are located in the philippines. So of course he looks up my ticket and there's no update other than it was sent to tier 2 support in the USA. He said call back later to check status.

Great. In the meantime, one of my largest customers is down. Probably going to be down until well into tomorrow AM.

eWebtricity
04-10-2005, 09:18 AM
I noticed this morning that my console was still logged into the recovery mode. So they hadn't done much throughout the night. So I called in and they said they were still working on it and hadn't figured out what the root problem is. Sounds more like they haven't gotten to it to me, I usually find they are pretty short staffed on the weekends.

VerityNS
04-10-2005, 10:54 AM
I hate to sound like a pessimest, but this is so typical for 1and1. A server is down.. Your largest client is without service and I'm sure he/she is about 2 minutes away from finding another hosting provider ( after being down for HOURS with no sign of the problem being fixed soon and no explination to why this happened ), and 1and1 is lollygagging around!

Time to switch to Beachcomber bro. One of my servers with them had a problem where Apache was binding and restarting almost every hour. After about the 5th notification that I got about Apache restarting and the server being down, I e-mail support at Beachcomber... Within 5 minute I got a return e-mail saying that a support engineer was looking at the problem. 25 minutes after that, I got another e-mail saying the engineer recomplied Apache and the server was running fine again. Haven't had a problem with that server since!

I think my point is, should ANYONE have to experience HOURS of downtime with no explination and NO fix to the problem???? That isn't customer service! I really dont know why they pay those people to answer the phones. They know nothing, they can fix nothing, and the majority of the time they are rude on top of it all!

I've just had my fill with 1and1....

eWebtricity
04-10-2005, 11:14 AM
yeah my tolerance is draining quickly. I've already contacted Rack911.com this morning about possibly moving this customer and others. We rolling up on 12 hours of down time with little or no progress. I was logged into the recovery console last night went i went to sleep around 2AM. When I woke this morning the console was still up, so that tells me they didn't do anything all night. If they did look into it they would have had to reboot the box to check out the hardware, reseat the drive cables, etc ... and it would have knocked me off. I suspect they didn't even pick the ticket up until this morning when the phones rolled back over to the USA. So it's now 11AM and we're still down, no hope in site.

I think i'll check out beachcomber too.

eWebtricity
04-10-2005, 12:04 PM
Called back at 12PM and as I suspected the phone rep said that it hasn't even been picked up by a tech yet. He said on the weekends they run a skeleton crew in the data center as in one person. He said that one guy hasn't gotten to the ticket in his queue. Actually he said the ticket wasn't even assigned to anyone, but that he would go ahead and assign it to the one guy on duty. I should now hear from them within two hours.

gee, thanks.

VerityNS
04-10-2005, 12:21 PM
Called back at 12PM and as I suspected the phone rep said that it hasn't even been picked up by a tech yet. He said on the weekends they run a skeleton crew in the data center as in one person. He said that one guy hasn't gotten to the ticket in his queue. Actually he said the ticket wasn't even assigned to anyone, but that he would go ahead and assign it to the one guy on duty. I should now hear from them within two hours.

gee, thanks.

That is COMPLETELY unacceptable in my standards of business! I'm not sure what time zone you are in, but basically you have been completely down now for well over 12 hours... How can anyone be understanding that a server is down for that long! How can anyone even compete when everyone is offering well over 99% uptime.

AHHH... Its making me mad and its not even my server that is down! LOL

eWebtricity
04-10-2005, 03:56 PM
So it's 3:45PM EST and no change. So I called support for status and they said. The ticket hasn't been picked up STILL. The phone rep said there's nobody in the data center right now. I asked when somebody would be back. He said probably not until tomorrow morning. I can barely contain my anger and outrage on the phone. 1and1 just lost this account. The phone rep said all I can do is call back later.

I don't smoke, but i think i'm going outside to burn one.

eWebtricity
04-11-2005, 12:12 AM
So around 4:15PM shortly after getting even worse news from 1and1 support our monitoring system pages me with a recovery for the server. I check it out and poof! The server is up. I immediately kickoff a backup to grab the most recent data. After that finishes, I check out the customer sites and data. Had to run some MySQL maintenance to fix some broken tables but after that, the customer was up and running again.

Thanks to whoever fixed the problem at 1and1, sure wish I knew what you guys did to fix it. But still doesn't change the fact that the service and the customer experience were horrible.

Note to 1and1:

Unfortunately, this is my experience with 1and1 everytime I need some support on a real issue. You seem to pull it out in the end but getting to a resolution is tremendously painful. Some better communication from your support staff is a must. Four times I was advised that I would be emailed by a tech with status, I didn't recieve a single email. Most likely becuase nobody picked up the ticket. Also setting expectations would have helped quite a bit if we can understand what's going on in your operation. We're human just like you and understand your customers have issues just like ours and that your busy just like us. But you have to communicate to your customers what's going on, not just leave them in the dark.

VerityNS
04-11-2005, 12:18 AM
I would agree with everything written there but I'll go one step forward and say that it goes to show if you ( 1and1 ) really care about your clients or not!

A down server should get the highest priority there is! There is NO WAY a server should stay down for close to 24 hours! I dont suppose they are going to issue a credit for a full days worth of non-use out of the product????

1and1's prices are beyond good... But that doesn't allow for the worst customer service known to man in this industry! I'm now paying more money for my servers each month but I'm getting the best service I can ask for! Money WELL spent! I guess my father was right... You DO get what you pay for!

eWebtricity
04-11-2005, 12:23 AM
I'm glad we have our servers spread out across hosting providers.

I think LinuxGuy said it best with ....


Don't keep all your eggs(servers) in one basket(provider)!


Something like this could have really hurt us, luckily this was a single customer that was down. But it was a big customer.

NeverPanic
04-11-2005, 03:03 PM
1 and 1's cutthroat prices had to pop up as a festering sore on their performance sooner or later.

I've been seriously considering a relocation, but I'm faced with 5 sites with unique IP addresses and SSL certtificates that would be a MAJOR pain, plus commercial software (such as tbackup) and added Perl modules that would have to be dealt with.

Sigh.