Better, Faster, Stronger!

October 24, 2007

LyricWiki just got better because we’re much faster now at delivering lyrics to the world and much stronger against slashdot-effect or digg-effect problems.

Our server upgrades are continuing along quite smoothly.  Last night we got Squid caching up and running.  That means that if the wiki delivers a page to a logged-out user once, the rendered page is saved by Squid until it is changed, saving all of the database lookups and processing to turn the WikiText into a full-blown HTML page.   This measure is awesome since it not only makes a large majority of the browsing faster, it also makes the site extremely resistant to Slashdotting / Digging / etc. (since those are logged-out users all accessing the same pages – which would be in the cache).

Currently, only about 30% of our page requests are getting served by the Squid, but that’s partially since the API has people sending all kinds of weird requests at it (varied spellings, capitalizations, etc.).  Wikipedia serves around 60% of its pages through their Squids, so we have potential for even more savings as the web-traffic catches up to the API traffic.

Tonight, I’ll be moving on to try to use load-balancing to get our other web-server into the party (this is a bit trickier than it sounds, so it might take a while).  Then I’ll try to upgrade the new web-server with APC like I did for the first web-server yesterday.  Once that setup is done, I’m going to be begging for a slashdot just to see how well the servers can fare against that kind of onslaught (we can take it!).


Op-code caching is working. Setting up Squid (beware of possible outages).

October 24, 2007

LyricWiki is expanding…

Last night I got APC working on the active web-server (but not on the ‘spare’). APC is pretty stellar – brought the CPU usage on the web-server from around 90% down to around 15%. Tonight I’m working on getting Squid caching set up. This involves a few things:

  1. Getting the Squid running as a cache server so that most pages get cached and served very quickly.
  2. Figuring out how to use the Squid as a load-balancer and balance between the other two machines (they need an interesting setup so that the album art images only get stored to one server).
  3. Switching over to use this new configuration (so that the Squid is connected to the world, and the Apache web-server boxes are behind it).
  4. Getting the ‘spare’ web-server up to speed (getting APC working on it despite that it’s running a different version of CentOS).

I’m in foreign territory here, and I’m learning as I go – so there may be some outages tonight. I’ll try to avoid them and quickly revert them if they do happen, but don’t be surprised if you start getting weird errors for a few seconds at a time.

This entry probably sounds completely Greek to most people (as it probably would have to me about a week ago). After things get stable and the servers have a chance to breath, I’ll try to write a post explaining first: in layman’s terms how the server-farm is all set up, and second: how others running MediaWiki can get their systems set up the same way (as easily as possible).

“Too many connections” and other lameness.

October 22, 2007

There have been some outages during peak-times over the last several days (coincidentally since I upgraded the servers… all I did was beef up the ram and give some of that to memcached though, so I’m thinking it’s just coincidence).

I think a large part of the cause is that we have a bunch of new users of the API which is rather database and server intensive (since the title-matching is fuzzy).

Just last week we added CantoPod to our list of users, now we have the even more popular GimmeSomeTune iTunes plugin. Each popular plugin generally means tens of thousands of extra requests for lyrics per day.

As soon as I can figure out how to configure APC (op-code caching for PHP) and then Squid, services should be back up to speed.

Diesel power! (or actually just RAM)

October 18, 2007

Ok, so it’s not diesel power, but after a few hours of running, the server is noticeably faster.

After becoming the lyrics provider for CantoPod, our servers have been getting an extra dosage of pain.  CantoPod is an iTunes plugin with hundreds of thousands of active users and just decided to switch over to LyricWiki within this week.  It was good timing right as we were about to upgrade.  We had a few rough days, but we’re back!

I mentioned earlier that we’re going to be adding a Squid caching server to the configuration, but I have a surprise: we’re also going to add another web-server!  (for the techies: that’s one Squid w/4gb of ram, 2 Apaches – one with 2gb ram, one with 4gb ram, and a mySQL server with 4gb ram.  Most if not all are dual-core Opterons or Xeons).

So this week we’re essentially triple-and-a-halfing our RAM, and doubling the number of servers – all in a more intelligent layout (since we’ll now have a Squid)!  With this we should be easily able to triple our traffic or more before seeing the site slow down again.

Our outage was only a couple of minutes to swap over the new RAM, ’cause our server guy is hardcore.  If you’re in the market for dedicated hosting, you should check them out: G3 Tech.  They do not suck.

I’ll give another warning before any outages for switching to the new configuration with the two additional servers.  They should be really brief or nonexistent though.

Upgrading RAM at 4:45pm EST

October 18, 2007

The server(s) will be down tonight around 4:45pm EST… we’ll try to get it back up as quickly as possible.

And then we will rock out!

Upcoming Outages (and upgrades!)

October 18, 2007

I’ve ordered $1,000 worth of RAM and most of it is going to be put into our existing servers tonight. The rest of the RAM is for a new server that I’ll be installing soon which will be a “Squid“. At Wikimania 2006, I was fortunate enough to be joined for lunch by one of the guys running Wikipedia’s Tampa datacenter, and he gushed over Squid. It really sounded fantastic, and we’re finally at a point with scaling where we can afford to get an extra box for this Squid. On Wikipedia, the reports are that this server can cache pages and serve around 70% of requests without having to go to the webserver/database server for anything. I expect this number will be considerably lower since we have a much smaller set of logged-out users, but it should still rock… hard (and keep up highly available if we are put on digg or LifeHacker again).

When I get word from my host about exactly when the RAM is going to be switched out, I’ll post more details. The Squid update should be within the next several days (not sure when yet… stay tuned for that also).