So far, LyricWiki has been running pretty well with the new server (our 5th server, named “Cochise“) set up with the rest of them. I’ve moved the API completely to that server which takes a good deal of the stress off of the site itself.
The site has been pretty fast since the new server has been up. However, there have been occasional slow-patches, and looking at the CPU usage on the main webserver for the site – it’s definitely still not “calm”.
So here’s a fun surprise: today I ordered another server! 😀
For the first time in a looong time, we can hopefully stay ahead of demand instead of suffering for a couple of weeks until it’s unbearable and we’re forced to upgrade.
W00t! This is an uplifting blogpost… lots of surprises! Anyway: the second surprise is that starting in October, I’m going to be decreasing the time I spend at my day-job by 40% so I’ll only be in the office 3 days per week. That’ll give me two whole days more each week where I can work on Motive Force products like LyricWiki. This should help things become much more stable very quickly.
Well, that was an abnormally enjoyable post for this blog – which usually just announces outages! The site still isn’t totally upgraded (because the weekend ended before I could make all of the extensions work), so I have to run back to that. I took snapshots of a bunch of performance stats before and after, so I’ll post those sometime soon.
Thanks for your patience during those past few weeks. Hopefully we’re in a whole new era for the site!
The new server exists, I’m setting it up and am hoping I can have it all done before I go to sleep.
There is a decent amount of stuff that needs to be done before it’s all working. If you’re technical, or just like watching lists as they are being completed… I’ll be tracking the upgrade here.
I’ll also be making occasional updates to the @lyricwiki twitter account.
There will be downtime for an unknown length of time tonight. I’ll try to keep it to a minimum, but the site is so slow it’s practically down anyway.
PS: Special thanks to our awesome webhost for getting the new box here & set up quickly!
This week, the site has been extremely slow and even gone down and up a couple of times. I searched for a problem for a while but it appears that we’ve just really hit the wall on how much traffic we can support with our current servers. That’s fairly good timing since we’d been planning to move to more servers for a little while, so I’d already begun to look into it.
Today I ordered another server with the same specs as the current Apache server. This will bring us up to 5 total servers running LyricWiki. For the curious (and tech-savvy): that’s one squid caching server in front of two Apache web servers which talk to one mysql master server and one read-only replica mysql server.
To get the server to be as beefy as we need, I had to ask the hosting company to order extra RAM for it. So we’re just waiting for that to be delivered (hopefully around this weekend or very soon after) and we’ll be ready to start working to get the new server pulled into our setup.
In addition to just having more man-power machine-power to handle our traffic, this will give two additional benefits immediately. The first is that we can use the new server to test out the upgrade to the newest version of MediaWiki (the software that runs our site as well as Wikipedia). The second benefit is that now we’ll have two Apache servers – currently the most overworked part of the system – with one running the API and one running the site itself (lyricwiki.org). This will let us more quickly identify when something is wrong with one of those two systems and it will make sure that problems with either of them are unlikely to effect the other.
Exciting times… stay tuned!
For the first time all day, the site is moving at what appears to be full-speed.
Also, to answer the earlier-posed question about the Squid having to log a ton of extra space when it restarts, it turns out that is true – apparently something (either Apache restarting or somehow detecting that the Squid just came back) triggers it so that the MediaWiki install sends a <em>ton</em> of “PURGE” requests to the Squid server which basically tell it to forget about a page it may be caching because it’s probably out of date now. Each request is another line in the log-file, so that’s about 800,000 extra lines in the span of a few minutes.
Apparently the site was down while I slept, but I had emails & comments in my inbox about the outage as soon as I got up so I was able to jump on it right away.
The Squid had run out of memory again. The access log files for one day were 17 gigs. That seems awfully high – maybe we’re getting spidered too hard or the logs go through serious stress after a restart?
I’ll be finding a more permanent solution to the issue, but in the meantime the site is “up” but it’s going to be fairly slow while the cache refills… again.