You may think, as I did, that analyzing the Linux kernel is like venturing through a dark dungeon: without the addition of advanced tracers like SystemTap, there's much that can't be seen, and can only be inferred. However, I've recently found hidden switches that turn on some bright lights, strategically placed by Steven Rostedt and others since the 2.6.27 release. These are the ftrace profilers. I haven't even tried all the switches yet, but I'm stunned at what I've seen so far, and I'm having to rethink what I previously believed about Linux kernel performance analysis.
Recently at Netflix (where I work), a Cassandra database was performing poorly after a system upgrade, and disk I/O inflation (a massive increase in the number of I/O operations submitted) was suspected. There can be many causes for this: a worse cache-hit ratio, record-size inflation, readahead inflation, other applications, even other asynchronous kernel tasks (file system background scrubs). The question was: which one, and how do we fix it?

















