Link-Map-Reduce in Riak an example from inagist.com
My last post felt a little incomplete without some code backing it up. I'm following it up with a code sample of how exactly this map reduce is wired up.
I will walk through how we do the "Popular Replies" section on the conversation page. Again here is a @BarackObama tweet, with more than a 500 replies. Popular replies extracts only those replies which have been further replied to, re-tweeted or a reply from the author of the tweet itself. Right now its picked out 1 of these 500+ replies.
Resonses to a tweet are captured in a bucket of its own <<"tweet_responses_bucket">>. Each tweet is keyed by its tweet id as a 128 bit binary <<TweetId:128>>. Response details are not stored directly on this resource but a linked value in a bucket called "tweet_responses_subkeys_bucket". Responses are stored as links on a resource keyed as <<TweetId:128, (ResponseId rem 10):8>> in this bucket. This resource is added as a link on the {<<"tweet_responses_bucket">>, <<TweetId:128>>} resource and tagged as <<"tweet_response">>. A reply is recorded as a link of the form {{<<ResponseId:128>>, <<ResponseAuthorId:128>>}, <<"reply">>}. A link is represented as {{Bucket, Key}, Tag}, this link does not point to a valid bucket, key pair but is purely for our own interpretation.
Here is how it would look
<<"tweet_responses_bucket">>
----------------------------
|----------------------------------------|
|----------------------------------------|
| {{<<"tweet_responses_subkeys_bucket">>,|
| <<20337776197:128,0:8>>}, |
| <<"tweet_response">>}, |
| {{<<"tweet_responses_subkeys_bucket">>,|
| <<20337776197:128,1:8>>}, |
| <<"tweet_response">>}, |
|----------------------------------------|
|----------------------------------------|
<<"tweet_responses_subkeys_bucket">>
------------------------------------
|----------------------------------------|
| <<20337776197:128,0:8>> |
|----------------------------------------|
|{{<<20339861590:128>>,<<18035803:128>>},|
|----------------------------------------|
|----------------------------------------|
|----------------------------------------|
| <<20337776197:128,1:8>> |
|----------------------------------------|
|{{<<20337857101:128>>,<<82294968:128>>},|
|----------------------------------------|
|----------------------------------------|
And now here is the piece of code this does the extraction of the popular replies. The function gives a sorted list of {TweetId, AuthorId} tuples which are then looked up and served.
https://gist.github.com/510070
Hopefully the code is self explanatory. Of interest is the make_local_fun which creates a function reference which can be passed over to a remote node, without the remote node having a copy of this compiled code in its path.
Feel free to comment on anything I have overlooked or could be done better :)