Help is on the way… ordered another server

September 12, 2008

This week, the site has been extremely slow and even gone down and up a couple of times.  I searched for a problem for a while but it appears that we’ve just really hit the wall on how much traffic we can support with our current servers.  That’s fairly good timing since we’d been planning to move to more servers for a little while, so I’d already begun to look into it.

Today I ordered another server with the same specs as the current Apache server.   This will bring us up to 5 total servers running LyricWiki.  For the curious (and tech-savvy): that’s one squid caching server in front of two Apache web servers which talk to one mysql master server and one read-only replica mysql server.

To get the server to be as beefy as we need, I had to ask the hosting company to order extra RAM for it.  So we’re just waiting for that to be delivered (hopefully around this weekend or very soon after) and we’ll be ready to start working to get the new server pulled into our setup.

In addition to just having more man-power machine-power to handle our traffic, this will give two additional benefits immediately.  The first is that we can use the new server to test out the upgrade to the newest version of MediaWiki (the software that runs our site as well as Wikipedia).  The second benefit is that now we’ll have two Apache servers – currently the most overworked part of the system – with one running the API and one running the site itself (lyricwiki.org).  This will let us more quickly identify when something is wrong with one of those two systems and it will make sure that problems with either of them are unlikely to effect the other.

Exciting times… stay tuned!

Advertisements

Finally fast again.

August 10, 2008

For the first time all day, the site is moving at what appears to be full-speed.

Also, to answer the earlier-posed question about the Squid having to log a ton of extra space when it restarts, it turns out that is true – apparently something (either Apache restarting or somehow detecting that the Squid just came back) triggers it so that the MediaWiki install sends a <em>ton</em> of “PURGE” requests to the Squid server which basically tell it to forget about a page it may be caching because it’s probably out of date now.  Each request is another line in the log-file, so that’s about 800,000 extra lines in the span of a few minutes.


Your watchlist as an RSS feed!

July 27, 2008

Finally you can have your watchlist stream you content when the pages you are interested in have been updated!

Thanks to a new extension by Teknomunk, you can now enable the rss feed at http://lyricwiki.org/Special:WatchlistFeed.

Since this is a really great feature to have in any wiki, Teknomunk plans to make this extension public so that all MediaWikis can use this feature (it is very commonly requested). Thanks Teknomunk! 🙂


Find more songs with “implied” redirects.

March 30, 2008

I’m proud to announce a long-overdue feature: implied redirects.

Implied redirects make it so that the site can often understand what you’re looking for even if it is misspelled or we don’t have a redirect page for the specific song.  For example, we have a redirect from the band name “Of A Revolution” to their preferred form “O.A.R.“.  However, if someone comes to the site, we do not have a redirect from “Of A Revolution:Crazy Game Of Poker” to the correct page: “O.A.R.:Crazy Game Of Poker“.  With the new implied-redirects extension, the site will automatically figure out what you meant and display the correct page.  To see it in action, go to “Of A Revolution:Crazy Game Of Poker“.

Implied redirects have been active in the API for quite some time, but didn’t work on the site until tonight.


3/12/08 9:30pm EST: Upgrading LyricWiki – expect outages.

March 13, 2008

We’re going to be upgrading LyricWiki to the newest version of MediaWiki.  This is good for a number of reasons, not the least of which is that the newest versions always have the lowest number of known security vulnerabilities.

There are also some new features that our extensions can’t work without.

Teknomunk was kind enough to jump right in and test/re-write a bunch of the extensions (I think all of them actually) to make sure they work in the new version.  He also made some upgrades which only work in the new version (which was what ultimately caused the decision to upgrade… so give him mad props :)).

Anyway… this is going to cause a lot of outages tonight.  I’m going to shut down the API (the SOAP), back up the database, then do the upgrades, fix what’s broken, then turn the API back on.   I’ll keep this blog updated as things progress.


Why our “stack” rules.

February 7, 2008

A “stack” is what tech people call their set of technologies that work together to run their application. It occurred to me recently how much LyricWiki‘s stack rules. We have only four servers and the web-server alone (which has an entire caching server in front of it, so this is just things that get past that) has been very comfortably handling more than 1.5 million pages per day. All of this and since I figured out this setup there haven’t been any slow-times even during the heaviest traffic. Did I mention this was only four commodity(cheap) servers!? How does it pull this off? Our stack is hardcore like Atreyu. Check it out:

The datacenter

The datacenter is a powerful beast at a great price. LyricWiki is hosted by G3 Technologies which has redundant everything and is on this cool Internet Exchange which puts our servers one-hop from an insane number of local Pittsburgh area people and campuses. We’re even one-hop away from Penn State (which is relatively far on a map). The fact that the place is so affordable is what has made it possible for LyricWiki to be supported only by non-invasive advertising.

The servers

LyricWiki is running on purely legal software for a total cost of $0. Win! Our operating system is CentOS which is basically the freely-compiled version of the same source-code as RedHat (G3 were the guys to tip me off to this; I naively hadn’t heard of CentOS 2 years ago). On top of the OS we have 3 different types of servers. A Squid caching server, an Apache web-server, and two mySQL servers (one is a replica/slave).

The application

The pages are written in PHP which is really efficient and is made much faster by APC op-code caching (this stores compiled versions of the code automatically since PHP is an interpreted language). The wiki itself is running the MediaWiki code (the same code that powers Wikipedia). I initially thought that MediaWiki was slow and bloated, but as time went on, I found out that in fact it is just optimized for very large wikis with a ton of traffic – this is because it was developed in conjunction with Wikipedia’s growth. One of my favorite features that was added into MediaWiki for scaling is the ability to instantly plug in the powerful in-memory object-caching system: memcached.

Other free stuff

On top of all of that, we’ve made heavy use of other technologies. ÜberBot, who added the first 200,000 or so songs to LyricWiki (to give it critical mass) was written in Perl which is fantastic at this sort of thing. We’ve also made search plugins for FireFox, Netscape, Safari, IE, etc. , a Facebook Application, and even leveraged the SOAP standard to make an API that has been used dozens if not hundreds of other places.

Third-party related tools

In addition to all of that stuff, we use Google Analytics and AWStats to track our stats.  The logo was made in the somewhat cumbersome but-hey-it’s-$400-cheaper-than-photoshop GIMP, and the code that I wrote for the site was written in Notepad++.  I’m sure I could go on and on listing other awesome (and almost always free) stuff that we use (IRC and even WordPress that I’m writing this on), but this post is getting long!

We use tons of amazing software and don’t have to pay for any of it! Brilliant! That’s kind of an interesting thing to notice being a programmer myself, but in the end that’s just something that the industry has to adapt to (much like the music industry with downloadable music).

I think I’ll leave you at that now. There was a TON of technology in there and I think I could write pages and pages about any one of the things I mentioned or linked to (except Atreyu, ironically), but that is for another day. If any of you are curious about any of the topics above, please comment on this post and it’s quite likely that I will expand up the topic later!

See how much our stack rocks? 🙂


Weirdness fixed!

November 16, 2007

Long story short: everything was backwards but it is fixed now.

For the curious, here are the technical details:

Although undocumented, the MediaWiki doesn’t just use your normal one-line database settings for your master server… if you have multiple servers set up, it uses the server at index 0 as the master.  Our “slave” was at index 0 with 99% of the read load until yesterday.  Then it was still at index 0 but with only 70% of the read load.

The effect this had was that the powerful server just kind of sat there with stale data serving 1% of reads until yesterday, when it started serving 30% of the reads, showing it’s ugly out-of-date data to the world.

I copied the data from the “slave” to the master and changed the configuration so that it really is the master again.  Most of the data (except for a few minutes of changes right before the switch that were mostly by myself and another admin) should still be in tact.

Whatamess that was.  Thanks to all of the admins and contributors on LyricWiki for pointing these problems out to me, and thanks to TimStarling of WikiMedia and domas of MySQL for their help figuring out what was wrong and helping me fix it.  You guys are ninjas.

Good night.