Most listened-to songs of 2008

December 14, 2008

A post on LifeHacker mentioned that Last.fm has released their lists of the most listened-to songs of 2008.  Since their service “scrobbles” (logs) everything that their users listen to, they have a pretty massive data-set.  They are UK-based, so there is quite a bit of a UK bias in the results, but they are still interesting nonetheless.

There were two somewhat-annoying things about the lists: 1) each entry is on a different page, so you have to go to 30 pages to see all of the results 2) no links to lyrics!

So we’ve taken the liberty of compiling them for you into a nice concise list of links to the lyrics-pages:

http://lyricwiki.org/LyricWiki:Lists/2008/Last.fm

Enjoy!


New server looks good (plus a surprise!)

September 25, 2008

So far, LyricWiki has been running pretty well with the new server (our 5th server, named “Cochise“) set up with the rest of them.  I’ve moved the API completely to that server which takes a good deal of the stress off of the site itself.

The site has been pretty fast since the new server has been up.  However, there have been occasional slow-patches, and looking at the CPU usage on the main webserver for the site – it’s definitely still not “calm”.

Surprise 1

So here’s a fun surprise: today I ordered another server! :D

For the first time in a looong time, we can hopefully stay ahead of demand instead of suffering for a couple of weeks until it’s unbearable and we’re forced to upgrade.

Surprise 2

W00t! This is an uplifting blogpost… lots of surprises! Anyway: the second surprise is that starting in October, I’m going to be decreasing the time I spend at my day-job by 40% so I’ll only be in the office 3 days per week. That’ll give me two whole days more each week where I can work on Motive Force products like LyricWiki. This should help things become much more stable very quickly.

Well, that was an abnormally enjoyable post for this blog – which usually just announces outages! The site still isn’t totally upgraded (because the weekend ended before I could make all of the extensions work), so I have to run back to that. I took snapshots of a bunch of performance stats before and after, so I’ll post those sometime soon.

Thanks for your patience during those past few weeks. Hopefully we’re in a whole new era for the site!


Setting up new server right now

September 20, 2008

The new server exists, I’m setting it up and am hoping I can have it all done before I go to sleep.

There is a decent amount of stuff that needs to be done before it’s all working. If you’re technical, or just like watching lists as they are being completed… I’ll be tracking the upgrade here.

I’ll also be making occasional updates to the @lyricwiki twitter account.

The new server will be another Apache server and is named “Cochise“. That is the name of the last great Apache chief, and also the inspriation for the song Cochise by Audioslave.

There will be downtime for an unknown length of time tonight. I’ll try to keep it to a minimum, but the site is so slow it’s practically down anyway.

PS: Special thanks to our awesome webhost for getting the new box here & set up quickly!


New server coming tomorrow!

September 19, 2008

And not a moment too soon.  Traffic is down a significant percentage in the last few days.  The site is so slow and bothersome that I even find myself hesitating to load more pages when I go to the site.

Anyway, while I was waiting, I made a list of all of the changes I think are necessary for the upgrade (although there is the potential that I could get surprised).  I plan to update the list during the upgrade.  There will be an outage during which the list will be unavailable, but that’s not a huge deal, hopefully.  Some of the tasks don’t require the new server to be here for them to be started, so I’ve already begun.

Here is the list of upgrade tasks: http://lyricwiki.org/upgradeList.html

ps: If anyone wants to make a style-sheet to make the page less ugly, I’d be more than happy to put it up, but designing one is lower-priority for me than doing the things on the list.


Help is on the way… ordered another server

September 12, 2008

This week, the site has been extremely slow and even gone down and up a couple of times.  I searched for a problem for a while but it appears that we’ve just really hit the wall on how much traffic we can support with our current servers.  That’s fairly good timing since we’d been planning to move to more servers for a little while, so I’d already begun to look into it.

Today I ordered another server with the same specs as the current Apache server.   This will bring us up to 5 total servers running LyricWiki.  For the curious (and tech-savvy): that’s one squid caching server in front of two Apache web servers which talk to one mysql master server and one read-only replica mysql server.

To get the server to be as beefy as we need, I had to ask the hosting company to order extra RAM for it.  So we’re just waiting for that to be delivered (hopefully around this weekend or very soon after) and we’ll be ready to start working to get the new server pulled into our setup.

In addition to just having more man-power machine-power to handle our traffic, this will give two additional benefits immediately.  The first is that we can use the new server to test out the upgrade to the newest version of MediaWiki (the software that runs our site as well as Wikipedia).  The second benefit is that now we’ll have two Apache servers – currently the most overworked part of the system – with one running the API and one running the site itself (lyricwiki.org).  This will let us more quickly identify when something is wrong with one of those two systems and it will make sure that problems with either of them are unlikely to effect the other.

Exciting times… stay tuned!


An important message…

April 1, 2008

Don’t worry… be happy.

UPDATE: This was an April Fool’s joke. On April 1st, every page on LyricWiki resulted in a Rick-Rolling. To view the page as it would have been on April 1st, please try this permanent link.


Find more songs with “implied” redirects.

March 30, 2008

I’m proud to announce a long-overdue feature: implied redirects.

Implied redirects make it so that the site can often understand what you’re looking for even if it is misspelled or we don’t have a redirect page for the specific song.  For example, we have a redirect from the band name “Of A Revolution” to their preferred form “O.A.R.“.  However, if someone comes to the site, we do not have a redirect from “Of A Revolution:Crazy Game Of Poker” to the correct page: “O.A.R.:Crazy Game Of Poker“.  With the new implied-redirects extension, the site will automatically figure out what you meant and display the correct page.  To see it in action, go to “Of A Revolution:Crazy Game Of Poker“.

Implied redirects have been active in the API for quite some time, but didn’t work on the site until tonight.


Why our “stack” rules.

February 7, 2008

A “stack” is what tech people call their set of technologies that work together to run their application. It occurred to me recently how much LyricWiki’s stack rules. We have only four servers and the web-server alone (which has an entire caching server in front of it, so this is just things that get past that) has been very comfortably handling more than 1.5 million pages per day. All of this and since I figured out this setup there haven’t been any slow-times even during the heaviest traffic. Did I mention this was only four commodity(cheap) servers!? How does it pull this off? Our stack is hardcore like Atreyu. Check it out:

The datacenter

The datacenter is a powerful beast at a great price. LyricWiki is hosted by G3 Technologies which has redundant everything and is on this cool Internet Exchange which puts our servers one-hop from an insane number of local Pittsburgh area people and campuses. We’re even one-hop away from Penn State (which is relatively far on a map). The fact that the place is so affordable is what has made it possible for LyricWiki to be supported only by non-invasive advertising.

The servers

LyricWiki is running on purely legal software for a total cost of $0. Win! Our operating system is CentOS which is basically the freely-compiled version of the same source-code as RedHat (G3 were the guys to tip me off to this; I naively hadn’t heard of CentOS 2 years ago). On top of the OS we have 3 different types of servers. A Squid caching server, an Apache web-server, and two mySQL servers (one is a replica/slave).

The application

The pages are written in PHP which is really efficient and is made much faster by APC op-code caching (this stores compiled versions of the code automatically since PHP is an interpreted language). The wiki itself is running the MediaWiki code (the same code that powers Wikipedia). I initially thought that MediaWiki was slow and bloated, but as time went on, I found out that in fact it is just optimized for very large wikis with a ton of traffic – this is because it was developed in conjunction with Wikipedia’s growth. One of my favorite features that was added into MediaWiki for scaling is the ability to instantly plug in the powerful in-memory object-caching system: memcached.

Other free stuff

On top of all of that, we’ve made heavy use of other technologies. ÜberBot, who added the first 200,000 or so songs to LyricWiki (to give it critical mass) was written in Perl which is fantastic at this sort of thing. We’ve also made search plugins for FireFox, Netscape, Safari, IE, etc. , a Facebook Application, and even leveraged the SOAP standard to make an API that has been used dozens if not hundreds of other places.

Third-party related tools

In addition to all of that stuff, we use Google Analytics and AWStats to track our stats.  The logo was made in the somewhat cumbersome but-hey-it’s-$400-cheaper-than-photoshop GIMP, and the code that I wrote for the site was written in Notepad++.  I’m sure I could go on and on listing other awesome (and almost always free) stuff that we use (IRC and even WordPress that I’m writing this on), but this post is getting long!

We use tons of amazing software and don’t have to pay for any of it! Brilliant! That’s kind of an interesting thing to notice being a programmer myself, but in the end that’s just something that the industry has to adapt to (much like the music industry with downloadable music).

I think I’ll leave you at that now. There was a TON of technology in there and I think I could write pages and pages about any one of the things I mentioned or linked to (except Atreyu, ironically), but that is for another day. If any of you are curious about any of the topics above, please comment on this post and it’s quite likely that I will expand up the topic later!

See how much our stack rocks? :)


Correction: downtime rescheduled for 1/17/08

January 17, 2008

The server isn’t going down tonight… instead I’ll wait until tomorrow because our web host
is putting my servers into new racks (for free… shwing!).  There’s going to be some downtime for that (gotta move 4 servers), and probably a brief outage tomorrow morning/early-afternoon to do that little fix I was going to do tonight.

Ripping off the band-aid… getting all of the downtime out of the way tomorrow hopefully :)   (haven’t had any since Thanksgiving until this I don’t think).

After I’m in, I’ve also started working on a way to make downtime even shorter in the rare occasions that we will have to restart the server in the future (previous downtime was primarily due to the surges in traffic which we finally figured out how to handle).


API back up to speed

November 15, 2007

If you’ve been using the API over the past week, you probably noticed the painfully large percentage of results that were being returned as “Not found”.

The large increase in traffic recently was causing us to get “Too many connections” errors when the API was left alone, so  had to turn on a throttling system which would randomly drop a certain percentage of the requests.  Looking into our server logs, I found out that our actual web server (behind our Squid caching server which serves up 30% of our pages) has been getting over 1 million page requests per day!  Wow… that explains the scaling problems.

I was overly busy for most of the week (a drawback of having LyricWiki not be my “day-job”), so I first got to really attack the problem tonight.  It appears that everything is back up to speed, and the throttling is turned off.  I’ll be keeping an eye on how the site is doing tomorrow during peak traffic time, but I think we should be okay.

I have some more fixes planned for the near-future which should make it so the API can continue to handle increasing traffic.  I probably won’t post about them as they happen, but hopefully you’ll notice an increase in the speed that results are served up.