Spotty outages… very confused.

Yesterday I got the replicated slave database up and running and even made the API use MediaWiki’s built-in database-connections which are persistent, so that should have knocked down the amount of connect/disconnects (which are time-expensive).

Today, we’ve still been getting “Too Many Connections” errors… possibly because the MediaWiki persistent connections don’t close very quickly?  We’ll be looking into this some more… maybe I need new stats on how much traffic the API is getting.

Anyway, the solution I’ve taken is that during peak times, I keep setting the API to drop a certain percent of requests.  This isn’t a cool solution, so I’ll be trying to figure out a better way… anyone have any ideas?


7 Responses to Spotty outages… very confused.

  1. Sean says:

    Maybe instead of returning “Too much traffic! –” you could instead outright refuse the connection. I know with the way it’s currently running a large majority of programs utilizing the APIs are fairly useless as they assume the above quoted text is the lyrics to the song.

    As far as a soltion…couldn’t you clone the database on a nightly basis and export it to a separate server or set of servers to handle the demand? The website would have real-time access to the database while the API would be using the 24 hour delayed secondary server/database.

  2. lyricwiki says:

    Thank you for the advice.

    There actually is a copy of the database (it uses replication so it’s delayed by only a second or so instead of a day)… but even having that on its own database isn’t a good enough solution yet.

    I figured out a bunch of software fixes, but I’m not sure when I can get to them. I’m hoping I can this weekend.

    I’m not sure how to refuse a connection from inside of a PHP script… I’ll look into that if I get a chance.

    Thanks for your feedback.

  3. Sean says:

    Could you set it up temporarily where when the request is being refused instead of returning:

    “Too much traffic! –”

    Respond back the same way as if the music was not found in the database? This will ensure that software using your APIs won’t tag the lyrics of 50% of the songs with the above mentioned error code.

    Not found is much better then incorrect error data.

  4. lyricwiki says:

    Good call… that’s a decent temporary solution since I have to fix this anyway.

    In the long-term I should figure out a better way of providing “non-results” like “server is down” or something because “Not found” is a bit misleading… when users get this they won’t know whether the song is really missing or if they should just refresh. Coincidentally, pages that aren’t found take way more processing power than pages that are (since we look for the page several different ways before giving up).

    … also, I got a log-analyzer up yesterday and it appears we’re doing 1,000,000 pages per day, not 500,000 (I’m pretty sure this includes web spiders, but still… dang).

    Thanks for the tip… the results now say “Not found” if the result is being throttled. Once I get the server issues fixed, I’ll post again.

  5. Sean says:

    I’m a little confused with the API throttling….

    You are throttling the connection AFTER you’ve already taken the query and searched the database. The majority of your CPU usage and database connections are happening regardless if you block the actual lyrics being sent back to the client.

    Let me give you an example:

    Query Made -> Database Searched -> Response Given Whether Found -> Lyric Sent to client if found

    Query Made -> Database Searched -> Response Given Whether Found -> “No Result” sent to client if found

    So in short you aren’t saving any system resources the current way you are throttling and to be honest you are making things worse as people are retrying more often resulting in more useage.

    The easiest way to solve your bandwidth/CPU issues temporarily is to send the response before you touch the database.

    Query Made -> Send not found result

    The imperative thing is you cannot just send some clear text to the client through the API like you are right now. For the time being have the result be IDENTICAL to if it was properly searched in the database and came back as not found. This will ensure the end clients hammer you less when they are being throttled and that auto-taggers aren’t changing the lyrics of every song to “Not found”

    If you have to send back a clear txt response just make the response NULL and the taggers will assume the song wasn’t found.

    I just spent 2 hours writing a program to re-tag my library with the current server issues. Hopefully my suggestions don’t stab me in the back =(

    I entered in my email when posting this comment. Contact me if you need some help.

  6. lyricwiki says:


    You are throttling the connection AFTER you’ve already taken the query and searched the database. The majority of your CPU usage and database connections are happening regardless if you block the actual lyrics being sent back to the client.

    That’s not what’s happening… I think my wording must have confused you. Before the database is searched, I randomly choose whether to drop or not. If it’s dropped then the database never gets searched. What I meant when I said that “Not Found” requests are the most expensive is that searches for lyrics the actually can not be found (because they are not in our database) are expensive.

    The results currently being returned by the throttled attempts ARE identical to if the lyrics aren’t there as of my last post (I took your advice).

    Which plugin do you use btw?

  7. Sean says:

    I’m using iTunes Lyrics Importer and I could swear it is properly searching the database on every query that is made. So far it has been 100% accurate as to whether the song’s lyrics exist in the database regardless if the lyrics are actually downloaded.

    Perhaps it’s doing the search via the website and only utilizing the API when it attempts to download the lyrics?

    iGrabber gives me very similar if not identical results.

    Either way I’m fine with the throttling, just wanted to make sure the database wasn’t actually being searched on every query reguardless of throttling.

    You’re the pro, just trying to give you an idea of what’s being experienced from this end =)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: