September 25, 2008
So far, LyricWiki has been running pretty well with the new server (our 5th server, named “Cochise“) set up with the rest of them. I’ve moved the API completely to that server which takes a good deal of the stress off of the site itself.
The site has been pretty fast since the new server has been up. However, there have been occasional slow-patches, and looking at the CPU usage on the main webserver for the site – it’s definitely still not “calm”.
So here’s a fun surprise: today I ordered another server! 😀
For the first time in a looong time, we can hopefully stay ahead of demand instead of suffering for a couple of weeks until it’s unbearable and we’re forced to upgrade.
W00t! This is an uplifting blogpost… lots of surprises! Anyway: the second surprise is that starting in October, I’m going to be decreasing the time I spend at my day-job by 40% so I’ll only be in the office 3 days per week. That’ll give me two whole days more each week where I can work on Motive Force products like LyricWiki. This should help things become much more stable very quickly.
Well, that was an abnormally enjoyable post for this blog – which usually just announces outages! The site still isn’t totally upgraded (because the weekend ended before I could make all of the extensions work), so I have to run back to that. I took snapshots of a bunch of performance stats before and after, so I’ll post those sometime soon.
Thanks for your patience during those past few weeks. Hopefully we’re in a whole new era for the site!
September 20, 2008
The new server exists, I’m setting it up and am hoping I can have it all done before I go to sleep.
There is a decent amount of stuff that needs to be done before it’s all working. If you’re technical, or just like watching lists as they are being completed… I’ll be tracking the upgrade here.
I’ll also be making occasional updates to the @lyricwiki twitter account.
The new server will be another Apache server and is named “Cochise“. That is the name of the last great Apache chief, and also the inspriation for the song Cochise by Audioslave.
There will be downtime for an unknown length of time tonight. I’ll try to keep it to a minimum, but the site is so slow it’s practically down anyway.
PS: Special thanks to our awesome webhost for getting the new box here & set up quickly!
September 12, 2008
This week, the site has been extremely slow and even gone down and up a couple of times. I searched for a problem for a while but it appears that we’ve just really hit the wall on how much traffic we can support with our current servers. That’s fairly good timing since we’d been planning to move to more servers for a little while, so I’d already begun to look into it.
Today I ordered another server with the same specs as the current Apache server. This will bring us up to 5 total servers running LyricWiki. For the curious (and tech-savvy): that’s one squid caching server in front of two Apache web servers which talk to one mysql master server and one read-only replica mysql server.
To get the server to be as beefy as we need, I had to ask the hosting company to order extra RAM for it. So we’re just waiting for that to be delivered (hopefully around this weekend or very soon after) and we’ll be ready to start working to get the new server pulled into our setup.
In addition to just having more man-power machine-power to handle our traffic, this will give two additional benefits immediately. The first is that we can use the new server to test out the upgrade to the newest version of MediaWiki (the software that runs our site as well as Wikipedia). The second benefit is that now we’ll have two Apache servers – currently the most overworked part of the system – with one running the API and one running the site itself (lyricwiki.org). This will let us more quickly identify when something is wrong with one of those two systems and it will make sure that problems with either of them are unlikely to effect the other.
Exciting times… stay tuned!
August 10, 2008
For the first time all day, the site is moving at what appears to be full-speed.
Also, to answer the earlier-posed question about the Squid having to log a ton of extra space when it restarts, it turns out that is true – apparently something (either Apache restarting or somehow detecting that the Squid just came back) triggers it so that the MediaWiki install sends a <em>ton</em> of “PURGE” requests to the Squid server which basically tell it to forget about a page it may be caching because it’s probably out of date now. Each request is another line in the log-file, so that’s about 800,000 extra lines in the span of a few minutes.
August 10, 2008
Apparently the site was down while I slept, but I had emails & comments in my inbox about the outage as soon as I got up so I was able to jump on it right away.
The Squid had run out of memory again. The access log files for one day were 17 gigs. That seems awfully high – maybe we’re getting spidered too hard or the logs go through serious stress after a restart?
I’ll be finding a more permanent solution to the issue, but in the meantime the site is “up” but it’s going to be fairly slow while the cache refills… again.
May 13, 2008
You may have noticed an error message along the lines of “Host ‘pedlfaster.pedlr.com’ is blocked because of many connection errors” recently.
While we kept fixing that in the short-term, it kept popping back up. Now there is a more permanent fix in there, and I’m looking into what caused it (I’m assuming it was a spike in API traffic).
Regardless: you shouldn’t be seeing that anymore. If you do, harass us immediately!
October 24, 2007
LyricWiki just got better because we’re much faster now at delivering lyrics to the world and much stronger against slashdot-effect or digg-effect problems.
Our server upgrades are continuing along quite smoothly. Last night we got Squid caching up and running. That means that if the wiki delivers a page to a logged-out user once, the rendered page is saved by Squid until it is changed, saving all of the database lookups and processing to turn the WikiText into a full-blown HTML page. This measure is awesome since it not only makes a large majority of the browsing faster, it also makes the site extremely resistant to Slashdotting / Digging / etc. (since those are logged-out users all accessing the same pages – which would be in the cache).
Currently, only about 30% of our page requests are getting served by the Squid, but that’s partially since the API has people sending all kinds of weird requests at it (varied spellings, capitalizations, etc.). Wikipedia serves around 60% of its pages through their Squids, so we have potential for even more savings as the web-traffic catches up to the API traffic.
Tonight, I’ll be moving on to try to use load-balancing to get our other web-server into the party (this is a bit trickier than it sounds, so it might take a while). Then I’ll try to upgrade the new web-server with APC like I did for the first web-server yesterday. Once that setup is done, I’m going to be begging for a slashdot just to see how well the servers can fare against that kind of onslaught (we can take it!).