Hackdiary: Wikihacking in Brighton
This weekend, I went down to Brighton for the MediaWiki Hackathon. I'm not a MediaWiki developer, and I only really do PHP when I need to. I do have access to the Toolserver which is a live, read-only mirror of the databases that power the Wikimedia projects. This is very useful: if I need to, I can log in via SSH, type in sql -r enwiki_p and run SQL queries against Wikipedia.
So, what did I build at the Hackathon?
Not as much as I'd like. The four regular MediaWiki developers there smashed lots of bugs1. Me? I worked on a few different things.
I helped Jez from Open Plaques with the MediaWiki API, specifically Open Plaques can start using Wikimedia Commons to host images of plaques. A while back, I started pushing at Open Plaques to have Commons as an alternative file host to Flickr. Currently, Open Plaques recommends that you take a photo, CC license it, then post it on Flickr with a machine tag, and then Open Plaques pulls it in, and Flickr links to Open Plaques.
But Open Plaques could also support Commons in three ways:
By making it easy to add images from Commons.
We might be able to export all the license compatible files from Flickr to Commons, but with added metadata from Open Plaques (OP data is public domain although the photos are under various CC licenses, not all of which are Commons compatible).
Using Commons as a file hosting back-end: when people come to Open Plaques, they could upload photos directly to Open Plaques, and we'd then push them straight on to Commons and use that as a file host. They'd obviously have to agree to the relevant license and so on.
The first thing is a fairly straightforward one: being able to simply provide a Commons URL and then extract image metadata and the path to the image. That's relatively easy, and it looks like that might be something we can add.
After Jez left, I tried to work out what to do next. I decided it might be useful to have an API library in my trendier-than-thou functional programming language of choice, Scala. On the bus home, I made a few notes about the general design of such a library. After a bit of noodling with Maven, I actually got to the point where i could type mvn compile.
That I have to play these silly games every time I start a new Scala project is profoundly depressing. I basically either need to stop being a wimp and fully embrace SBT 0.10/0.11, or I need to write myself a new Maven archetype that does Scala properly and has all the libraries I want. And to wrap said archetype in a command line alias called something like "scala-new-project". Scala is supposed to be a fun, pragmatic and functional (in the sense of not-dysfunctional) language. Choosing between SBT and Maven is basically a choice between choosing a build system designed by hipsters and a build system designed by enterprise people. Whatever choice you make you will regret.
My build still has one major issue: mvn scala:console gives me errors, and JLine still buggers up my shell after I exit the damn console.
Databinder Dispatch is a breath of fresh air. It is intimidating at first to use a library where 70% of it seems to be punctuation, but the design decisions make sense. For using the MediaWiki API, it's actually very easy: you can basically do each different thing as a series of layered objects. Firstly, you construct a Request object that points to the API endpoint (MediaWiki isn't a RESTful API), then you use the <:< pseudo-operator to add all the relevant headers (specifically User-Agent), and then you can do it again to add in the authentication token (cookies rather than OAuth, but the principle is the same). Then, finally, you can simply provide a Map of the query you are sending, and then you supply an inline response handler.
I haven't yet gotten anywhere particularly interesting with the Scala library, but I've got some good ideas. It's currently in a closed repo. When it sucks less, I'll release it. That might not be for some time, sadly.
VMs, vagrants and maintenance/dev/
Over the weekend, I mentioned to one of the experienced MediaWiki developers about one of the things that puts me off MediaWiki hacking: dependencies. Hacking on MediaWiki generally means having a working MySQL install and a working Apache install, and so on. This can be a giant pain-in-the-ass on OS X, as the bazillions of tutorials on how to set them all up in the right way has shown. There's a reason why people are using things like Vagrant and other VM based systems. The same sort of quasi-VM type strategy seems to be happening with RVM, the Ruby Version Manager, which the Ruby community nicked from Python's virtualenv. Obviously, if you are in Javaland, you are running in a VM... the JVM.
But PHP is a bit more of a pain: this is the downside of being built around the very reasonable use case of "I want to be able to FTP it onto my server and have it work". What you lose in the development stage, you more than make up for in deployment.
At this point, someone pointed me to this discussion on wikitech-l.
The current bleeding edge MediaWiki basically has a script/server, and a built-in SQLite. It's almost like... Rails! You simply run maintenance/dev/install.php and it downloads PHP 5.4, and sets you up a development version of MediaWiki that uses PHP 5.4's built in web server. You can then just run maintenance/dev/start.sh and it'll boot up on port 4881.
There was one slight hiccup: the PHP 5.4 build script didn't like the fact that i had a space in one of the names of a parent directory to the SVN checkout of MediaWiki and borked. I quickly renamed it to remove the space and it compiled fine. Once the initial compile was done, I can now boot up a new, clean MediaWiki install in a few seconds. This makes it dramatically easier to start hacking on MediaWiki.
Although I didn't do any MediaWiki hacking over the weekend, I've just assigned to myself an issue. Lemme see if I can get some of my code running on Wikipedia...
After I took that photo, they finished work on a really tough bug involving GeSHi. ↩︎