July 19, 2009
There were several hours (about 5) of downtime this morning. We’re back now.
The problems don’t appear to have been serious and were simple to fix, the reason the downtime was so long was just that I was asleep and the texts to my cellphone didn’t wake me up.
Although this hasn’t been a problem until now, it’s certainly not the type of error that should be allowed to repeat itself. Here are some of the plans to prevent this from happening again:
- Remove the false-positives: The system has been sending some pages in the middle of the night for other non-urgent things for about a week. Those unneeded alerts will be cleaned up so that I’m not used to getting unimportant pages.
- Auto-recovery: This was a pretty simple fix, and there are a few types of problems that could be fixed automatically. If I have some time today, I’ll be working on a system that will look at a bunch of info across the servers and see if a problem is easily automatically fixable and try to fix it if I haven’t responded to the first two pages (ie: after 15 minutes of downtime).
- Ringtone change: The reason ambulances switch up their sirens is that people notice when things are different. The false-positives got me used to sleeping through that tone, so I’ve changed it to another (conveniently, far more annoying) tone.
Hopefully, we’ll be back to uptime-nirvana again.
Leave a Comment » |
Uncategorized |
Permalink
Posted by Sean Colombo
April 25, 2009
We delayed the previous datacenter move and adjusted it to a better time of day: 7am EST on a Sunday.
The outage should be between 30 minutes and an hour if everything goes smoothly and indefinitely longer if it doesn’t go smoothly!
Some additional info that hasn’t been on the blog: we were generously given two more servers by a local company that was upgrading (thanks!) and have ordered gobs of RAM for them. Once that is in, we will be upgrading LyricWiki’s setup once again. We currently have 6 servers, this will bring us up to 8! That’s a significant increase and hopefully will make LyricWiki even faster.
Leave a Comment » |
Uncategorized | Tagged: datacenter, downtime, LyricWiki |
Permalink
Posted by Sean Colombo
March 20, 2009
Our webhost is moving to a new datacenter right down the street from the current datacenter. We’ve scheduled with them to move the LyricWiki servers at 3pm tomorrow, March 21st. It will take slightly under a half an hour to actually move the servers (during which time the site will be down completely), then we will try to get everything back up and running as soon as possible.
If there are any major problems, we will post about them here. For more up-to-the-minute info, you can follow the @lyricwiki twitter account or join us on the #LyricWiki IRC channel on quakenet.
Leave a Comment » |
Uncategorized |
Permalink
Posted by Sean Colombo
February 26, 2009
The problems earlier today were apparently caused by an attempted distributed denial of service attack.
The datacenter has got all of this bad traffic blocked now, and the site appears to be back to normal (for several hours in a row now). The whole problem lasted about 1 to 2 hours but it’s tough to measure since the site was never completely down during that time.
7 Comments |
Uncategorized | Tagged: ddos, downtime, LyricWiki, Outages |
Permalink
Posted by Sean Colombo
February 26, 2009
The site is currently behaving very sporadically… there are some problems in the connection between our datacenter (where all of our servers hang out) and the internet.
Not sure whether this is related to the “emergency maintenance” Sprint was doing last night or not.
The team at the datacenter is working on it right now, and hopefully we’ll be back soon.
Leave a Comment » |
Uncategorized | Tagged: downtime, LyricWiki, Outages, Sprint |
Permalink
Posted by Sean Colombo