March 16, 2008
Everything is all patched up finally. The site and the API are both up, running, and seem to be handling traffic as well as ever. Thanks to everyone who helped as we worked through the problems.
On top of that, thank you to everyone for being patient while the API was down (it ended up being more than 3 straight days). This problem gave me a lot of time to think about how to avoid a similar problem in the future, so hopefully we won’t be going down this path again any time soon!
November 16, 2007
Long story short: everything was backwards but it is fixed now.
For the curious, here are the technical details:
Although undocumented, the MediaWiki doesn’t just use your normal one-line database settings for your master server… if you have multiple servers set up, it uses the server at index 0 as the master. Our “slave” was at index 0 with 99% of the read load until yesterday. Then it was still at index 0 but with only 70% of the read load.
The effect this had was that the powerful server just kind of sat there with stale data serving 1% of reads until yesterday, when it started serving 30% of the reads, showing it’s ugly out-of-date data to the world.
I copied the data from the “slave” to the master and changed the configuration so that it really is the master again. Most of the data (except for a few minutes of changes right before the switch that were mostly by myself and another admin) should still be in tact.
Whatamess that was. Thanks to all of the admins and contributors on LyricWiki for pointing these problems out to me, and thanks to TimStarling of WikiMedia and domas of MySQL for their help figuring out what was wrong and helping me fix it. You guys are ninjas.