Ryan Kennedy @rckenned - Tumblr Blog

Dropwizard TechEmpower Benchmark - Part 2: URI Parsing, Guava, and Exceptions

This is part 2 of an (I hope) many part series of posts exploring Dropwizard's TechEmpower benchmarks. When we left off, I'd managed to get the benchmarks running locally and wasn't impressed by what I saw.

Where to Begin?

Where you start when things are slow (or even broken) is important. We begin with relatively little information to go on. Simple HTTP requests to http://localhost:9090/json have much higher latency and latency variability than I would expect. While the benchmark weighs in at a mere 485 lines of Java (cloc is a wonderful utility), Dropwizard weighs in at a much healthier 38,235 lines of Java (although that includes all of the modules, whether used or not). In addition, Dropwizard pulls many libraries in via Maven.

Consequently, it's unlikely that we're going to solve this problem by simply doing a code review (I mean, you might…but I'm not). So we need to find another way to get started. In times like this I reach for a profiler of some sort. YourKit has always been a popular choice if you have budget. JConsole can work in a pinch and ships with the JDK. Lately I've been hearing more about Java Mission Control, so I gave that a spin. I won't get into too many details, but Java Mission Control and it's companion, Java Flight Controller are pretty excellent and they both come with the Oracle JDK.

How to Profile?

Posessing my new toy, I fired up the TechEmpower Dropwizard program to be benchmarked, started Java Flight Recorder, and proceded to run a script I'd constructed by dissecting what traffic the benchmark was generating. The script runs wrk, an HTTP load generation tool (a newer ApacheBench for those of you who know that tool). The script executes a short "priming" run where it's looking to make sure the server being tested is actually working. It then executes an intensive "warmup" run where it's attempting to load code, prime caches, and get the program through any JIT compilation stages (important for benchmarking a Java server). Once that's finished, it runs the benchmark stages: 15 second tests with 2 traffic driving threads at 8, 16, 32, 64, 128, and 256 concurrent connections. When the script finishes, I stop Java Flight Recorder and dig into the results.

What I'm Looking For

Now that I've collected my profile, it's time to see what was going on while the benchmark was running. Given how erratic the response times are in the original benchmarking runs, I suspect the JVM garbage collector is at play. Depending on what garbage collector you choose (I hope to do an entire post on that in the future), many of your collections are going to run without too much impact to your process. Major collections, however, are stop-the-world affairs. In these situations, the JVM stops all application threads while the garbage collector scours and scrubs memory in an attempt to reclaim memory. The JVM doesn't care what your application is doing when it decides to do one of these. Your threads get stopped, even if they're in the middle of servicing a user request. Externally, the benchmark script has no idea this is happening. It just notices that requests stall momentarily. This periodic slowing of requests is really good at providing the latency distributions we're seeing: nice, low latency numbers in the lower percentiles and ugly, higher latency numbers in the upper percentiles.

Java Flight Recorder has an entire section of the UI dedicated to memory. You can see how much is used over time. You can see the garbage collections over time, both minor and major, and how long they take (pausing your application in the process). Even more useful, like any other profiler Java Flight Recorder records all of the object allocations including the number of allocations, the size of the allocations, and the stack trace showing where the allocations are made. I quickly scan the list of classes:

There's no real smoking gun just from looking at the object types. Byte arrays aren't really surprising for a server that's reading and writing bytes to sockets. Character arrays and Strings aren't very surprising, either…you can easily make a lot of Strings. Integer arrays and Object arrays might be interesting. It kind of depends on what they're getting used for.

Curious to dig a bit, I select java.lang.Object[] and look at the stack traces where they're being generated:

Whoa…stop right there. Do you see the third entry: java.lang.Throwable.fillInStackTrace(int)? Throwable is part of Java's Exception hierarchy. I didn't see any failed requests during the run, so what's throwing so many exceptions that I'm generating 1.39GB of java.lang.Object[] just to fill in stack traces?

Oh…Hey, Guava

Expanding the java.lang.Throwable.fillInStackTrace(int) stack trace reveals a few things. First, Jersey (the reference implementation of JAX-RS, which Dropwizard uses for REST) has repackaged and included its own copy of Google's Guava (a popular collection of common code) internally (you can see it in the jersey.repackaged.… classes in the call stack). The exception being thrown is created in com.google.common.net.InetAddresses.forUriString(String), which is called by com.google.common.net.InetAddresses.isUriInetAddress().

Looking more closely at the code, this is a common anti-pattern where exceptions are used as flow control. In this case, isUriInetAddress() calls forUriString(), which throws an IllegalArgumentException if the host string given isn't a valid IP address. isUriInetAddress() catches the exception and returns false. In this particular case, the exception isn't necessary. It's immediately caught and discarded…including the java.lang.Object[] that was allocated to fill in the stack trace.

Following the stack trace a bit farther, we see org.glassfish.jersey.servlet.ServletContainer.service(ServletRequest, ServletResponse). This means that the exception throwing code in Guava is being called on every HTTP request. Given that we're trying to run a benchmark, we know we're trying to do a very high rate of requests. That causes us to dump a high rate of unused Object arrays onto the heap for the garbage collector to deal with.

Allocation is expensive enough as it is, but you pay a double cost when allocating memory with languages like Java because of the garbage collector. So it's best not to allocate memory unless you really need to.

This is also a case where the cost of allocation is largely hidden from the caller. The developer thinks they're just throwing an Exception. Under the hood, however, is a relatively expensive operation. This particular case is even more costly because the code generating the message for the exception, formatIllegalArgumentException(String, Object...) is executing java.lang.String.format(Locale, String, Object...), which is creating even more throw away memory, not to mention unnecessarily burning CPU cycles.

Guava, Could You Not?

Fortunately, the solution is pretty straightforward. isUriInetAddress() needs a version of forUriString() that doesn't throw an IllegalArgumentException. Maybe one that just returns null instead, so isUriInetAddress() can simply return forUriString(ipString) != null. I made the change locally, rebuilt the server, and ran a new benchmark against the original benchmark code and my newly enhanced Guava-based code.

This time instead of running the TechEmpower benchmark I ran a simpler benchmark that executed a fixed number of requests (500,000). With a fixed number of requests we can more accurately compare memory profiles. If we fixed the time, instead, it's possible that a faster server would execute more requests and since the memory use we're targeting is based on the number of requests being made, the memory profiles would be more difficult to compare.

Once again, I recorded both runs with Java Flight Recorder. I then held them side by side. The "base" version of the code executed 281 minor garbage collections (0 major) while the "enhanced" version executed 268 minor garbage collections (also 0 major). I'm not entirely sure how to reconcile the small difference here. Used, reserved, and committed heap size appears pretty similar in both tests. You may be able to see in the screenshot that I've bounded the time considered to just when the benchmark was running. You can also see that the bounded time for the "base" test was about 42 seconds while the bounded time for the "enhanced" test was about 36.5 seconds. So even though the collections were comparable, the "enhanced" run was about 13% shorter.

The major sign of improvement comes from looking at the Thread Local Allocation Buffer (TLAB) statistics. For the "base" version, there was a TLAB count of 185,187 and 16.70GB of total memory allocated. For the "enhanced" version, there was a TLAB count of 155,059 (>16% reduction) and 12.91GB of total memory allocated (>22% reduction). One thing that's incredible about this is that the "base" profile showed only 130.77MB of java.lang.Object[] allocated as a result of Guava throwing exceptions. We shaved 3.79GB off the total allocations, however. So it's obvious that while Object arrays are what got our attention, they aren't the only thing that was being allocated by the thrown exceptions.

I suspect the lack of improvement in GC counts and time spent is mostly due to GC tuning. I did a separate run with -XX:NewRatio=1, which doubled the size of the "young" generation: the default is -XX:NewRatio=2 and sizes the young-to-tenured generations by setting the young generation to 1 / (1 + NewRatio) of the heap. Comparing -XX:NewRatio=2 to -XX:NewRatio=1 with the Guava enhancement yields a drop from 268 to 178 minor collections and a drop in overall GC time from 685ms to 458ms. Nothing in this run affected how much garbage was being generated or how much needed cleaning, only the ratio of the size of the young generation to the tenured generation.

This makes sense since most of the objects being created are during a request, which we've seen can complete in about a millisecond. A larger young generation means more space for these short-lived objects. One thing that's slightly unintuitive to me, however, is that even though the collections are less frequent they take just as long, even though they're collecting more garbage. A quick look at the verbose GC output shows a minor collection with -XX:NewRatio=2 collecting 69,976K in 0.0014054 seconds and a -XX:NewRatio=1 minor collection wrangling 104,965K in 0.0019387 seconds. That comes out to 49,790,806K/s and 54,141,950K/s respectively. That's an 8.7% improvement in collection speed for doing nothing more than re-arranging our heap.

This is where my garbage collection knowledge begins to run out, so I'll leave explaining what's going on there for another day. In the meantime, I'll also make a mental note to adjust the young-to-tenured ratio for the official benchmark.

But Does the Benchmark Improve?

Now the important question, did the benchmark improve at all? I'll preface this by saying I don't have a strong benchmark setup locally just yet. So some of my numbers aren't the best. I think there's contention between a few processes that needs to be sorted out by setting up a proper benchmark environment.

With that said, I moved the code to an EC2 instance (a t2.micro, which I realize is horribly under-powered). With the code there, I re-ran the benchmarks and the tl;dr is yes, the JSON benchmark is quite a lot better.

At 8 concurrent connections average latency goes from 2.40ms to 1.86ms with the standard deviation reducing from 4.90ms to 1.91ms. 99th percentile latency drops from 25.33ms to 6.90ms. Overall throughput climbs from 4,546 requests per second to 4,763 requests per second.

We see similar gains at 16 concurrent connections as well. Average latency drops from 6.87ms to 3.13ms. Standard deviation drops from 15.82ms to 3.65ms. 99th percentile latency drops from 80.92ms to 13.71ms. Throughput shoots from 4,746 requests per second to 5,742 requests per second.

We continue seeing gains at 32, 64, 128, and 256 concurrent connections. However, the gains aren't nearly as strong. This may be partly due to the t2.micro instance not having the horsepower to keep up (not enough CPU cores for so many concurrent requests). Future development is going to require a better benchmark environment. Hopefully something closer to whatever TechEmpower is using.

Output for the base and enhanced Guava versions of the benchmark is available here.

So, Guava…

Right, so none of this helps if we don't get the patches in the hands of the right people. As luck would have it, my very first contribution to Guava was recently accepted! Now everyone can check host strings for IP addresses without fear of over-taxing their garbage collector.

A few things need to happen, however. First, remember how I mentioned that Jersey packages their own Guava? Yeah…they're going to have to upgrade the version of Guava that they bundle to include my changes. Then they're going to have to release a new Jersey with the new, bundled Guava. Once that's done, Dropwizard needs to pick up the new Jersey (preferably the new Guava as well since Dropwizard has it's own dependency on not-bundled-with-Jersey Guava). Once that's done, all of the Dropwizard users need to update to the new Dropwizard with the new Jersey with the new Guava (I make it sound so simple).

Of course, if you're impatient you can patch your own JAR files. That's how I tested all of this before submitting any patches. 😎

We're Done, Then?

Goodness no, we're not done with the TechEmpower benchmark by a long shot. This is just the first thread I pulled. In the course of looking for this leak I found a few other areas to explore and not all of them are memory-related (tease). Before we get there, however, I need to get a proper benchmark environment set up and maybe re-run the analysis from the Guava fix to establish new, better baselines for comparison.

If you've enjoyed this post and you're not already, follow me on Twitter to be notified of the next post. I'm also interested in feedback, questions, suggestions, emoji (preferrably not 💩, keep that to yourself), cat pics, and reactions (GIFs are more than acceptable). Send those via Twitter.

If you're an intrepid Dropwizard user and you patch your service before new Guava, Jersey, and Dropwizard libraries are released I'd love to hear how it goes. If you want help figuring out how to do that, hit me up on Twitter. I'd love to see if this positively impacts real world applications. Same goes for Guava users who aren't also Dropwizard users. I'm curious to know if I've made life measurably better for you as well.

Next post in about a week (I hope). As I said, I already have some other fun areas for exploration in mind.

If you've read this far, know that I appreciate you. 💖

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Dropwizard TechEmpower Benchmark - Part 1: Let’s Begin

Some of you may have heard of the TechEmpower Web Framework Benchmarks. As it turns out, the open source project that I’m involved with (Dropwizard) is featured in the benchmark. Dropwizard hasn’t fared terribly well in the benchmark, however, which is surprising because I’ve seen it in production. It’s been pretty snappy.

Determined to get to the bottom of the results (maybe I should rephrase that), I’ve forked and locally cloned the benchmark repo, located Dropwizard in the frameworks/Java/dropwizard directory, and followed the README.md instructions to run the benchmark locally.

The results aren’t pretty. Granted, the benchmark runs inside a VirtualBox instance alongside the Dropwizard app and a MySQL database. It’s not the raw latency that concerns me so much as the standard deviation and the long tail latency (i.e. latency at the 99th percentile). These are typically indications of scaling issues. Even at 8 concurrent connections with 2 driving threads, long tail latency for a simple JSON endpoint is alarmingly high:

Running 15s test @ http://127.0.0.1:9090/json 2 threads and 8 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.08ms 2.36ms 55.03ms 95.36% Req/Sec 5.81k 626.98 7.28k 82.67% Latency Distribution 50% 602.00us 75% 732.00us 90% 1.20ms 99% 12.42ms 173528 requests in 15.01s, 27.47MB read Requests/sec: 11561.92

As you can see, 75% of requests finish in under 1 millisecond. Even at 90% we’re still finishing in just over 1 millisecond. The slowest 1% of requests, however, take more than 10 milliseconds. As the concurrency maxes out at 256 connections and 2 threads, the median response time is unnacceptably slow for such a basic endpoint (even for an underpowered virtual machine) and the 99th percentile is heading towards 1 second:

Running 15s test @ http://127.0.0.1:9090/json 2 threads and 256 connections Thread Stats Avg Stdev Max +/- Stdev Latency 74.24ms 130.63ms 1.28s 88.12% Req/Sec 5.45k 0.94k 8.24k 72.00% Latency Distribution 50% 15.04ms 75% 85.84ms 90% 233.41ms 99% 620.43ms 162873 requests in 15.01s, 25.78MB read Requests/sec: 10848.85

Equipped with the fabulous IntelliJ IDEA, the indispensable Java Mission Control, insatiable curiosity, and a desire to see Dropwizard move up the ranks a bit I’m going to dive into the benchmark, into Dropwizard, and into Dropwizard’s dependencies to see what I can learn and what I can improve. I plan to do a series of posts showing my methods and remedies in the hope that it proves useful or educational to others.

If that sounds interesting to you, you can follow me on Twitter for updates. I’ve already identified a few improvements, which I hope are as entertaining as they are educational.

#dropwizard #techempower #benchmark #java

Lossless Emoji 🗣

The Open Source Bridge proposal deadline is, once again, upon us. Thanks to some polite nudging, I put together a proposal: Lossless Emoji - Doing Emoji Right (tl;dr I'd like you to go favorite my talk so it's accepted).

Some of you may remember that I gave my Fear Driven Development talk at Open Source Bridge 2015. In addition to being very emoji-heavy, I was also giving that talk while working on Magic Vibes…a now (or very soon to be) defunct social application with a heavy emphasis on emoji.

Late night backend deploys pic.twitter.com/JYb3igk0zC— Ryan Kennedy (@rckenned) August 14, 2015

In the course of building the app, I may have run across a few bugs in our emoji handling.

I learned a little about Unicode today but in so doing discovered how much more I don’t know…so I’m net more stupid than I started out— Ryan Kennedy (@rckenned) September 10, 2015

Like any inquisitive and determined engineer, I dug into the problem and discovered that the tools of our trade were failing us.

JavaScript (via nodejs) not handling unicode well for substring operations demonstrates my day well pic.twitter.com/fSdftJEper— Ryan Kennedy (@rckenned) September 11, 2015

Through a bit of research, coding, and testing, however, I managed to handle most of the prickly cases.

I don't want to say I conquered Unicode and Emoji today, but my tests pass and I now know many places probably handle them very wrong 🔥— Ryan Kennedy (@rckenned) September 11, 2015

It's not all 🌈s and 🦄s, though. A lot of the industry still struggles to do emoji correctly. We do things like turn your 🙋🏿 into 🙋 (oh dear…et tu, Tumblr?) or ⃞.

Tweetbot > Twitter web app, evidently pic.twitter.com/NBiJyFBozu

— Ryan Kennedy (@rckenned) June 23, 2015

But with a little bit of help, we can all have products that are 💯. So please favorite my talk and then get a ticket to Open Source Bridge (in lovely Portland, OR) so you can learn to make your apps more 💰 and less 💩.

My Favorite Debug Ever

One of my favorite things is debugging problems. I don't know why, but I genuinely enjoy it and I think I'm actually good at it. This is the story of one of my favorite debugging sessions ever.

I joined Yahoo! Mail back in late 2004. After doing mostly Java in my undergrad and then nothing but Java professionally afterwards, I was doing C++ and PHP at Yahoo!. A bunch of the PHP had to wrap underlying backend C++ libraries. C++ and I…were not friends in the least. I was used to the JVM hiding pointers and memory management from me. Nevertheless, after much bellyaching, I managed to get things working.

During internal testing we were getting sporadic complaints of file uploads failing. No HTTP errors from the server…the connection would just close itself. My more experienced coworkers told me this was typically the behavior seen when an Apache process would crash. The evidence would be found in core files on the affected machines. Sure enough, once I'd figured out where these mysterious artifacts could be found (having been a Java programmer for most of my life, core dumps were new to me) I quickly located quite a number of very large core files.

I had to figure out which of the core files (if any) were related to the problem I was investigating, which meant needing to learn enough GDB to load a core dump along with all the necessary symbols to be able to make sense of where things had gone wrong. At this point I had only a basic understanding of GDB (I had an unconventional undergrad experience having blown through a Computer Science degree in 2 years after spending 3 years as a Physics/Chemistry major), but I quickly figured out how to load the core dump and at least get a back trace. None of the back traces had anything to do with file uploads…they were segmentation faults all over the place. I started looking at the code indicated, but nothing looked out of place. I was completely unable to find any code that could be causing a segmentation fault.

About this time one of our frontend engineers caught the upload failure as it happened and called me over. He showed me again and again how the server would drop the connection. I asked for a copy of the attachment and went back to my desk. I sent the attachment to my own local development instance and watched, happily, as my process also crashed. This was the first breakthrough…a reproducible case. I put Apache into single process mode, attached GDB, and ran the request again. GDB caught the segmentation fault and dumped the stack trace. Unfortunately it was in an incredibly bizarre location. I had literally no idea what was going on.

The problem had a certain smell, however. It reminded me of something I'd seen in a previous job. I worked in Java at that job, but we had JNI wrappers for a vendor supplied library. I modified the wrapper once and it blew up in my face in a really non-obvious way (stack traces pointing to bizarre locations). A much more experienced engineer told me it sounded like I was "smashing the stack." I had an array on the stack and I was writing off the end of it, blowing up bits of the stack along the way.

Determined I was encountering the same issue, I started wondering how on earth one finds memory corruption like this. Yahoo! Mail was an enormous codebase…I couldn't just go spelunking for the problem. I needed help. During college, my senior project advisor had lent me a copy of Linux Application Development (I'm not sure why I never gave it back). On a whim, I flipped through it until I found a section on memory. In there, it talked about a tool called Electric Fence. Electric Fence replaced the system allocator, erecting barriers on either side of the allocated memory to detect buffer under and over flows.

I excitedly got back on the computer and began looking for it. I found a copy for FreeBSD, plugged it into my Apache module, restarted Apache, connected GDB, sent the doomed upload, and watched it fail exactly the same way: SIGSEGV instead of the expected SIGBUS Electric Fence ought to throw when an overflow occurred. "What the heck?", I thought. I spent some time looking at the fine print in the documentation and noticed that by default Electric Fence would allocate a full page and set the barrier on the next page. So small overflows wouldn't trigger Electric Fence. I found the setting (EF_ALIGNMENT) that put the barrier on the very next byte after what was requested in allocation, re-did the setup, and BOOM…SIGBUS. I ran the backtrace and found myself in the portion of code that was constructing the MIME body part, copying in the contents of the attachment provided.

It turned out that the underlying library could be called in different orders to construct a MIME message. Old Yahoo! Mail called it one order and New Yahoo! Mail (the one I was building) called it in another order. The order I was calling it in caused the buffer used to hold the attachment not to be properly initialized. As a result, attachments of a certain type and size (I remember it being nuanced, which is why it didn't happen all the time) could overflow the buffer into undetermined space. I filed a bug against the team owning the library, updated my code to work around the ordering problem, and re-ran my test successfully.

This was 5 years into what is now a 15 year career and it is still one of the best, if not the best, bugs I've ever tracked down and fixed. Mostly I think I liked it because I had to learn so many new things to figure it out. So solving the problem felt like a tremendous accomplishment.

Thanks to Bruce Perens for his wonderful tool, Dr. Emilia Villareal for lending me the book (I owe you a copy of the new edition), and the inspiring Julia Evans for asking me to write this up.

My Experience Conducting Successful Internal Hack Days

Every so often I’m asked to talk about hack day. Sometimes in the context of my time at Yahoo!. Other times it’s Yammer’s hack days. If you want to understand the full story, however, you need to know about both because my experiences at Yahoo! influenced hack days at Yammer and shaped the way I think about hack days.

Hack Day at Yahoo!

Yahoo!’s hack days can largely be traced back to Chad Dickerson, who organized the first Yahoo! hack day. Chad (now the CEO of Etsy, who has their own hack week) was influenced by organizations like Atlassian with their FedEx Day (now known as ShipIt Day, evidently).

The first hack day, which was mostly isolated to the search and marketplace teams, was so successful that Chad was able to get enough support to do a larger event for the entire Sunnyvale campus. This second Yahoo! hack day was my first ever and I was instantly hooked. I attended every hack day after that and also spoke and volunteered at our open hack days.

Hack days at Yahoo! were a huge deal, frequently drawing 100+ projects, hundreds of employees, and executive judges from across the company to the final presentations.

The Problem With Yahoo! Hack Days

http://techcrunch.com/2012/08/09/techcrunch-disrupt-sf-hackathon-judges-announced/

While I was loving hack day, there was a problem. At some point hack day became about opening up innovation. While some awards were tongue-in-cheek, like the “Most Likely to be Shut Down By Legal” award, some were given ambitious names like the “Ship It Now Award”. My friend Kent won that award three times. Sadly, none of those award winning hacks ever shipped. This was a recurring story at Yahoo! hack days. Judges (mostly VP level and above execs) would stand in front of the audience during the awards ceremony and talk about how excited they were about the hacks, how proud they were of the hackers, and how much they wanted to see the hacks go into production. It almost never happened, though. If you managed to eventually ship a hack to production you were an outlier. This became a considerable point of frustration for many hack day participants.

In addition, some hack days were giving out expensive prizes (an iPad or a Wii were especially popular back in the day). Sometimes it was straight up cash (I have a vague memory of a $25,000 prize for one event). Suddenly engineers were arguing (often in private, sometimes in public) about their hack being better than another winning hack or about another team “cheating” (typically by showing something they’d been working on previously instead of only during the 24 hour hack period). As you might imagine, this created a bit of tension among employees.

Between dissatisfaction over unfulfilled desires to ship and feeling cheated out of prizes, hack day was feeling pretty tainted. What was supposed to be an activity born from curiosity and passion had become a cause of infighting.

Hacking at Yammer

I left Yahoo! in 2009 and spent a year and a half kicking around inside Netflix before taking a call from a recruiter in late 2010 about a startup named Yammer. Yammer wasn’t doing hack days but they were interested in discussing them during my interview because it was a prominent feature of my resume. I was hired a week later and immediately jumped into the staging Yammer network to talk with my soon-to-be-colleagues. A topic I brought up frequently was hack day. “Hack days are great for exploring things like that.” “Oh…that reminds me of a hack day project I did/saw.”

Eventually the CTO cornered me in the office and told me, “shut up and give us a hack day.” I wouldn’t call it a statement of enthusiasm. More like, “if you think they’re so cool, prove it.” Fortunately, engineers took to it immediately and by the time I left the company in October 2014 we’d successfully run more than a dozen hack days in our San Francisco and London offices.

Measuring the Success of a Hack Day

First off, know why you want to have a hack day. Don’t just have a hack day because it’s trendy or because all the other companies are doing it. Second, figure out how you might measure (even subjectively) whether your hack day delivered what you wanted. Lastly, get your hack day organizing team (you have one of those, right?) together after the event and figure out whether it was successful in the ways you wanted it to succeed (you should also discuss the unexpected things that happened and things that didn’t go the way you wanted them to).

At Yammer I was looking for three things: engagement, excitement, and bonding.

Engagement, put simply, was a measure of how many people are participating. Yammer was consistently at 90%+ participation in the engineering organization, which included developers, operations, QA, product management, and designers (apologies to any teams I left off the list). Larger organizations will have a difficult time reaching 90%. Yahoo! never came close to it, but it was also a 10,000+ person organization by the time the first hack day was held. Smaller organizations can expect a higher turnout. Larger organizations may want to start small with a single team or business unit before expanding. Yammer did this after the acquisition by inviting other teams to come hack with us, eventually bringing in teams from SharePoint, Exchange, Xbox, and more.

By excitement I mean is everyone smiling? Are people asking when the next hack day will be? Are announcements for the next hack day quickly followed by people starting to form teams and ideas? Excitement is the fuel that drives these events. Indifference causes people to work on their day-to-day projects or simply surf and play games, which is not great for engagement. If you can’t get people excited about 24 (or more) hours of free time to build whatever they want, you probably have some other organizational issues that need addressing first.

Bonding was my ultimate goal for hack day at Yammer, although I don’t think I realized that until we’d been doing them for about a year. When we first started hack days at Yammer, engineering was only about 40 people. It was easy to know one another and even to have worked closely with one another. As we grew, however, that was harder and harder. Hack day was essential for keeping engineering close knit through these growth phases. It was a rare occasion for me to walk down the hall and see a face I hadn’t already seen presenting at the hack day podium. And what better way to quickly get to know someone than to see them presenting their hard work on stage? Hack days enabled Yammer to still feel like a small organization while we were growing rapidly. After the acquisition, hack day also helped us to bond with other teams within Microsoft.

Unexpected Benefits

In addition to the benefits we were aiming for, there were also unexpected benefits.

The high participation rates among engineering meant that almost every employee was in the room during presentations. One of the greatest presentations I’ve ever seen involved a single engineer presenting some mind-blowing integration between Yammer and Exchange. During the Q&A session after the presentation, a product manager on the judging panel said the PMs always wanted to do something like this but figured it was going to be way too complicated to do in a reasonable amount of time. Hack day provided the perfect, risk-free opportunity for someone to go out on a limb and prove that we should be more ambitious in product planning.

Hack days also influenced the way Yammer organized and operated. Successful hack day teams tend to organize into the most efficient teams possible for the problem at hand out of necessity. When you only have 24-48 hours to complete a project, you need a team that can rapidly iterate between design, development, and testing. Otherwise you end up with a broken project or nothing at all to present. That turns out to be really useful in day-to-day operations as well. Subsequently, the way Yammer operated through my time there was with small teams that looked and worked similarly to how hack day teams worked.

How Yammer Hack Days Were So Successful

I’ve often used panspermia to describe how hack day was successfully transplanted from Yahoo! to Yammer:

pan·sper·mi·a

noun

the theory that life on the earth originated from microorganisms or chemical precursors of life present in outer space and able to initiate life on reaching a suitable environment.

The key component is a “suitable environment.” I was very fortunate that Yammer management was hugely supportive of hack day. Management was responsible for scheduling hack days, which requires clearing project calendars. Employees also knew they had support from management to take time away from their projects so they could hack without worrying that it might reflect poorly at review time. This is absolutely critical. Most employees I spoke with at Yahoo! and Microsoft who did not participate in hack day programs cited worry about negative perceptions of their time spent hacking in reviews as the reason for not participating. If you want to have a wildly successful hack day program, it must start with management support.

Low expectations also played a large part in our success. Engineers were told up front not to expect anything to ship to production. We also kept Yahoo!’s tradition of silly names for awards in an effort to demonstrate that the awards were meaningless beyond some amount of recognition and novelty. Beyond some truly ugly, yet somehow highly coveted, animal trophies we never gave out awards of any physical value. Judges further drove home the point by frequently describing the judging process as “arbitrary and corrupt.”

We also had no expectations or rules around what people would/could build. We had the typical new features, better developer tools, and more that you see at every hack day. But people also built games, bots to track the status of various coffee machines, and even office package delivery tracking. Perhaps most surprising (and delightful) was when music videos became their own genre of hack.

Lastly, the rules were kept simple. Build whatever you want with whoever you want and then present it to the audience…no PowerPoint allowed. The PowerPoint rule is a nod to the only rule at Yahoo! hack days, which was meant to keep hack day focused on building things, not on explaining ideas. If you could get to an idea, you should try to form a team to build at least a prototype. Ideas are great, but you learn very little from them. In contrast, you can learn a lot from prototypes. You learn what’s going to be difficult to build, what obstacles will get in the way, and you develop a sense for what a full-blown implementation might require (time, people, resources, etc). If you want to see a hack day project eventually make it to production, that information is going to be essential for justifying why that project should be prioritized in the product pipeline.

In Closing

Looking back, I’m proud to have been a part of something so core to the experience of having worked at Yammer. There are some things I wish we’d done better. Hack day was not always the most professional environment, but then neither was Yammer.

At some point I’d like to sit down and write a FAQ and something like a hack day playbook to share all of the tips, checklists, and more that I have in my head that might benefit a group getting started or struggling with an existing program.

Since my experience at Yammer a number of individuals and organizations have asked to speak with me about my experiences and to share my thoughts on their programs. If your organization wants to get started with hack days or if you’re simply interested in talking more about them, feel free to reach out to me (@ mention me or DM, I’ve opened my DMs for a bit). I’m happy to have short, casual conversations for free. If you want something more we might be able to work out some kind of consulting arrangement.

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Leaving Yammer and raising money for women in computing

I’ve decided to leave Yammer in October. I don’t currently have a next thing in mind. I’m taking some time off to hang out with my family, take on more of the parenting, volunteer, work on side projects, and get back in shape.

I’m using some of this time to do fundraising for the Ada Initiative. I’m donating $4,096 and I’m using Microsoft’s matching program to double that to $8,192. I’m reaching out to friends and colleagues to support the Ada Initiative’s work and mission. I was introduced to the Ada Initiative by some colleagues and spent time learning more about who they are and what they do. “The Ada Initiative helps women get and stay involved in open source, open data, open education, and other areas of free and open technology and culture.” If you're already convinced, you can donate at this link - otherwise keep reading.

http://supportada.org/

There’s a lot of talk about diversity in technology. So much of it is focused on recruiting more women. But did you know 56% of women will leave tech within 10 years? That’s twice the attrition rate of men and it’s driven primarily by harassment and discrimination. See the history on Zoe Quinn and Anita Sarkeesian. Those are much more public cases than most, but they’re not rare occurrences by any means. If we focus only on recruiting more women in addressing diversity problems, we may achieve a greater diversity of new recruits but they will still drop out when they discover the same toxic, inhospitable environment that’s been there for years.

The Ada Initiative does a lot of work to improve the environment. Their Ally Skills Workshop, which I’ve attended twice, takes people who already want to help and makes them more aware of unhealthy situations and how they can handle them. Their Impostor Syndrome Training helps individuals who suffer from a sense that they’re unqualified to do the work they’re already doing. They’ve had an incredible 2014. But they’re a non-profit and they rely upon caring donors to fund their vital work. Please join me in supporting the Ada Initiative by making a donation here:

http://supportada.org/

Thanks for reading this. Lunch/coffee/whatever is on me if you donate. :)

#adainitiative #diversity

Trending Blogs

Last Seen Blogs

Ryan Kennedy