Concurrency, Parallellism and Strong Concurrency Models
Introduction
Last time, we spoke about good languages for distributed systems programming (by the way, sorry Erlang bros! I totally forgot you! You deserve consideration too!).
We have also covered null-safety. Both these subjects are very important since one has to be smart about how to develop your systems, keeping scalability without sacrificing sophistication.
If you want to develop software, hopefully with a business behind, in 2018, you are thinking of a SaaS. When you develop a SaaS, your customers should consume from the "gateways" that you establish to your services. On the modern web, these can be RESTful APIs, which should be highly available, resillient and fast, that mobile apps, websites, CLI programs, etc., can interface with through HTTP requests. Concurrency helps us get there because one of the main premises behind SaaS is that of handling large volumes of traffic, bringing revenue selling by volume rather than quality.
So, since availability is paramount to make it in this field, you will need to learn how to develop non-blocking systems --and for that, you should use Concurrency and Parallelism. Concurrency is a big asset to develop great SaaS offerings.
Concurrency and parallelism, in practice, are often conflated since both terms refer to similar concepts: the idea that you, as a programmer, should be able to do work on more than one single computing operation at the same time.
We'll stick to this broad definition for the remainder of the article, but from the point of view of what abstractions do our beloved programming languages nowadays give us to run tasks in parallel. This is what I call a concurrency model.
In a modern world where pretty much every processor is multi-core, from the most modest budget chinese cellphones around, to the latest 32 core behemoths running on datacenters, it is of great importance to be able to program concurrently.
So let's set some ground rules as said by Rob Pike: let's distinguish between concurrency and parallelism, with Concurrency being the general concept that one can compute multiple things at the same time, by distributing tasks in threads. Concurrency, however, is more concerned with the way you build things and parallelism is concerned about running things. Good concurrent languages are those that give you great features to make your job easier at the moment of actually building your systems.
It is of special remark to notice now, that, for example, some languages like Ruby MRI lack true parallelization, instead running everything on a single thread. However its concurrency options may still suit a lot of users since it has very strong concurrency models that are easy to use and it still lets you structure your programs in a non-blocking way, which is what we are concerned about mainly. The fact that a language doesn't leverage actual hardware threads may not be for everybody though, so you should keep this in mind while researching your tools.
Examples
But let's backtrack a bit: why is it desirable to be proficient in concurrency? Well, sometimes, particularly when you develop mission-critical systems where latency and effifciency are of the essence, you need to ensure that computationally "expensive" tasks run in parallel, so your system can respond in a non-blocking way. Besides, this is also great efficient resource utilization practice.
For example, if you were bringing up a web server, you'd want to make it so it can serve incoming requests from clients as swiftly as possible without blocking.
Imagine if a web server had to "queue" every incoming request, and process it one-by-one? What if it's 12 PM and it's the lunch rush, and your Cool Disruptive Startup™ that sells lunch through an app made it so, that your users have to wait more if they requested their lunch seconds, or milliseconds even, later than the previous user?
If you make a SaaS user wait too long to get service, you usually lose them forever and they will walk away angry that your service is so shitty. That's no good! We're trying to make money here!
Another example: you develop the systems for a rideshare company and your business process requires that, every time that a user requests a cab, they talk to your service from their app, in their mobile devices. To get them a cab, you need to make a very heavy query to a database which stores all the data from all the available drivers, including their position, as geolocation data. This is a huge dataset and the query is complex, and thus, queries to this database are expensive (and let's assume you don't have a cache in place, which would be a terrible idea, but bear with me for the sake of the experiment. Not to mention, additionally, that I/O as a rule is expensive, be it over the network or to/from disk). When your service performs a query to the database server the request has to travel through the wire, the database server has to do its thing by understanding the request that came over through the network, spinning up disks, find the data, and send it back to the server over the wire, where your service has to deserialize it into, perhaps, some kind of business object or another data structure, and even then, the result of whatever operation the backend did has to travel over a cellular network to your client. Each layer of networking that your information has to go through adds latency on top of the latency that your service already has to begin with.
Imagine if you had to block your system for so long whenever you get a request?
So: blocking = bad. Think of the human body. Our body is highly parallel and built concurrently: one can easily talk to their friends while driving down the road while listening to music. But what if one sneezes? Sneezing is the only "blocking" task in the human body, one which blocks all other tasks without question. If you sneeze while at the wheel, you lose control of the vehicle for a couple of seconds, you drop your conversation and you are unable to enjoy the music on the background. But this is bad design, we want to write code that's as un-sneezy as possible. If your code is highly sneezy, it will fall short eventually.
So what will we do now?
Well, you should be non-blocking in the way that you respond to your users' requests from your web server. It's better that all of your users wait in average 50 ms to receive a response from your API, than having the first user do 10 ms and the remaining 400 ms, and worse, as time passes. This is unacceptable. Again: for a SaaS, high latencies usually mean death.
Now, this is no news, at all. Back from the 90s, we had the problem of the C10K, which used to be a tough frontier to crack back then. The idea behind the C10K is that we have arrived to a point where systems on the Internet are so prevalent and the amount of people converging on the Internet is so massive, that it was necessary to achieve enough efficiency in our web servers to serve 10 thousand people concurrently as if it was nothing. A lot of interesting papers were produced on the subject, leveraging the then-prevalent programming languages and techniques to achieve concurrency, and I urge you to read them.
You'll find that, fortunately, a lot of progress has been made in the field since then and a lot of great ways to handle this kind of problem have now been devised, some of which I'll explain next.
Concurrency models in different languages
Pitfalls
What's the problem with concurrency though? Mainly the problems arise when you need to operate on the same area of memory among threads, or share information between threads. What if you have ten different threads doing different things, but they all need to know, say, the value of a global variable, like, for example, a player's HP in a game? What if the player's HP is 30 but one thread decreases it because it's the one that handles the game enemies, turning it into 29, but you have another thread, say, maybe the thread that handles spells, and it was expecting this variable to be 29, but, since the spell thread completes sooner than the enemy thread, it still sees a 30 when it runs? What if two threads try to diminish the player's score at the same time, from 30 to 29, but they both hit at the same time and you end up with 28 and this is unintended? How can we ensure that memory is seen correctly between threads?
Since the differences between the moments in time that the threads can act on memory can be so small, so things are not always truly deterministic when two threads are operating at the same time, then it becomes necessary to implement extra steps to guarantee that no race conditions or heap corruption happen when operating on shared state (or, preferably, do NOT do shared state, stat. More on this later).
A Race Condition is a situation where the result of a computation is incorrect because of differences in the order that threads finish generating it, and Heap Corruption is a situation when a program's state is 'damaged' by the action of threads, usually leading to absolute chaos at worst and a bad computation at best.
How do you avoid these kinds of situations? You coordinate access to memory by using patterns such as locks, semaphores and mutexes (short for mutually-exclusive). Race Conditions and Heap Corruptions are incredibly hard to debug by the way. Here be dragons. AKA, if it is possible to not do concurrency this way, don't. Absolutely don't. You have been warned. Not to mention, doing malloc() and free() in 2018? Dang, son! Party like it's 1972 will you!. Has any progress been made in this front in the last --ahem--, 50 years? What do garbage-collected languages offer to us in this realm?
C
But let's start with the basics: C, as it usually does, lets you handle threads manually, which is both incredibly powerful as it can be incredibly burdensome and dangerous. Using the pthread.h header file available in most implementations of the C language, you can make use of the pthread_create and pthread_join functions, which both allow you to manage threads manually. There is also a pthread_t type which you can use to represent thread id's in memory, since invokations of the pthread_create and pthread_join functions both require pointers to the thread_id of a thread to be passed for them to work:
#include <stdio.h> #include <stdlib.h> #include <pthread.h> void *func(void *vargp) { sleep(1); printf("Hello World!\n"); return NULL; } int main() { pthread_t thread_id; pthread_create(&thread_id, NULL, myThreadFun, NULL); pthread_join(thread_id, NULL); exit(0); }
pthread_create creates a new thread and runs it, while not blocking, whereas pthread_join will block until all threads sent to it are done doing their tasks.
It is of special interest to mention here that C++ has a variety of other resources also useful to do concurrency well. One very popular example is the boost series of libraries for C++ which a lot of developers leverage across the world.
Javascript
Javascript, no matter your criticisms of the language, has a pretty cool concept baked right into the language in the form of the Callback pattern, which you can use together with Promises to easily achieve concurrency.
In javascript, you can assign variables to functions (please notice that I said functions, not only values, such as those returned by evaluations of functions, mind you), and thus, you can pass functions as arguments to other functions. This is used in many codebases to set up the instructions of what should be executed upon successful completion of a given function. That's why we call these kinds of functions "callbacks" in the context of Javascript. It calls back to you.
For example, in javascript you can print a message after timeout with:
setTimeout(function(){ console.log("Hello World!"); }, 1000);
Notice how the call to the setTimeout function (which is included in most default implementations of javascript) actually takes another anonymous function as a parameter, before the number of milliseconds that the function is supposed to await. This is a callback, then, that will execute upon completion of the setTimeout function, which just waits as many milliseconds as specified on the second parameter of the function call.
Now, let's try Promises.
A promise encapsulates a non-blocking, async computation, its result, and its status in runtime, and it will be one of three types of objects once it completes running. Quite literally, and redundancy be damned, a Promise is a value that we promise will be there in memory after we finish computing it. Thus, your program should handle each of the possible return types of a Promise.
A promise can have three different states:
Pending
Fullfilled
Rejected
The advantage of having Promises as an idiomatic feature is that it allows to put a lot of complexity into very few lines of code. It's also perfectly safe against race conditions and all other kinds of async shenanigans
Converting the previous example so it runs with promises, it becomes:
var promise = new Promise(function(resolve, reject){ setTimeout(function(){ console.log("Hello World!"); }, 1000); var solved = true; if(solved) { resolve("Resolved!"); } reject("Welp, something broke"); }); promise.then( function(result) { console.log(result); }, function(err) { console.log(err); } );
By using the then() function of a Promise, you can declaratively specify what should happen if your promise succeeds or fails. The cool thing is that in runtime, by inspecting the promise, you'll be able to see in real-time how it fullfills over time. For example, if you run the previous code on a javascript console, you'll see that it first goes into pending state:
Promise { <state>: "pending" }
Then once it completes it will do whatever was specified on the then() call:
Resolved! Hello World!
You should see Resolved! being output to the console first, no matter that it's later on in the code, and then a second later, the output of the test print statement, Hello World!.
And if you inspect it now:
Promise { <state>: "fulfilled", <value>: "Resolved!" }
Pair this together with sane functional programming principles, such as the aversion to share state, leveraging the declarative programming style that an expressive functional language gives you, and suddenly concurrent programming becomes really easy in Javascript.
Scala
Promises are also included in other languages like Scala as the Future abstraction, which encapsulates an async computation.
A Future is a read-only template that represents a value which should eventually exist, but not yetm since it's delegated to an async pipeline.
Converting the previous example to scala, you'll see:
import scala.concurrent.ExecutionContext.Implicits.global import scala.concurrent.Future val msg = "Hello World!" val f: Future[Unit] = Future { Thread.sleep(1000) println(msg) }
This will compute whatever we set inside the Future as an anonymous function, while not blocking, and eventually print the value: Hello World!. If inspected before completing, the future will show as this:
f: scala.concurrent.Future[Unit] = Future(<not completed>)
And after completing:
res2: scala.concurrent.Future[Unit] = Future(Success(()))
Future objects can also be created through scala Promises, which, similar to javascript promises, provide means to add implications of failed or fullfilled state to futures, set a a return value, and handle their different possible returns. Given the functional nature of Scala, scala Futures/Promises together end up being really similar to javascript Promises in their usage:
import scala.concurrent.ExecutionContext.Implicits.global import scala.concurrent.{Future, Promise} val p = Promise[String]() val msg = "Hello World!" val producer = Future{ val f = Future[Unit] { Thread.sleep(1000); println(msg) } var succeeded = true if(succeeded) p success "Succeeded!" p failure new IllegalStateException // WITHOUT BLOCKING! Do you understand why this is great? } val f = p.future val consumer = Future { f foreach { res => println(res) } }
Here, I use the names producer and consumer to make it a bit clearer that there's a relationship between the Future that will fullfill the promise's future, and the promise which is fullfilled by the consumer. Notice how Promises have future attributes, accesible through promise.future. Then, by using the success and failure methods on a promise, we set the state of the async computation going out.
You should see Succeeded! being output to the console first, no matter that it's later on in the code, and then a second later, the output of the test print statement, Hello World!. Sound familiar?
Scala Futures and Promises provide us with a lot of features to thread-safely handle the result of an asyncronous computation. If you run this example in a scala environment, you will get a non-blocking output to standard output, and the ability to inspect the status of the function. What I want you to take out of this small scala explanation, if nothing else, is that structuring your programs placing your logic inside Futures warrants non-blocking async computation. This is both expressive, simple to grok, simple to code and debug and easy to reason about.
Java
Java, since Java 5, has a Futures interface. However, Java 8 brought a very popular implementation of the Future contract to the table in the form of the CompletableFuture type. Think of it as a Future, with additional logic to indicate completion.
CompletableFuture follows the same school of thought that Javascript and Scala follow, borrowing heavily, of course, from them:
This is pretty cool because... I'm not going to lie, doing traditional concurrency patterns in Java, with their synchronized atomic primitives and whatnot, is... not really pleasant to work with to say the least!
The canonical example we've been using so far, in Java 8, turns into:
import java.util.concurrent.Future; import java.util.concurrent.CompletableFuture; import java.util.concurrent.Executors; import java.lang.InterruptedException; import java.util.concurrent.ExecutionException; class FutureTest{ static Future<string> printMessageAsync() throws InterruptedException, ExecutionException { CompletableFuture<string> future = new CompletableFuture(); boolean completed = true; Executors.newCachedThreadPool().submit(() -> { try{ Thread.sleep(1000); } catch(InterruptedException ie){} System.out.println("Hello World!"); }); if(completed) future.complete("Completed!"); else future.completeExceptionally(new IllegalStateException("Welp, something broke")); return future; } public static void main(String[] args) throws InterruptedException, ExecutionException{ Future<string> future = printMessageAsync(); System.out.println(future.get()); } }
Again, as with every other example above, you should see output on the standard output stream being presented in a non-blocking fashion:
Completed! Hello World!
Go
And now for something completely different: Go.
Go is the beautiful brainchild of all the old-school Unix people coming together and making new, more elegant stuff, learning from their mistakes and from the parts that made C and Cpp brittle and harder to handle: its concise syntax, beautiful constructs and web-centricism make it an ultra-fast, low-overhead language for the modern web, and Everything's all Right with the World.
The Go model for concurrency is the goroutine. A goroutine, perhaps similarly, but not quite, to Futures and Promises, is a simple abstraction that encapsulates a unit of computation work. Non-blockingly Calling a function asyncronously is as simple as prepending the reserved word go in front of the function:
To handle the problem of passing information between goroutines, go has channels, where a channel is a typed, thread-safe, independent pipeline for values.
goroutines, thus, shovel and pull values into and from channels. Channels live throughout the execution of the program in runtime, waiting for you to send values to them to handle.
The good thing about goroutines is that they are even more efficient than doing threads manually: the go runtime will only multiplex the goroutines into multiple threads if necessary, else all the goroutines will be ran on a single thread, splicing processor time between them. If a task is big and blocking, it will be set onto another hardware thread. And they have really low memory footprint too!
Converting the previous example into go:
package main import ("fmt" "time") func f() { time.Sleep(1000 * time.Millisecond) fmt.Println("Hello World!") } func main() { var input string go f() fmt.Println("Completed!") fmt.Scanln(&input) }
Note that here I am blocking by using an auxiliar variable and asking the user for input, but that variable isn't used for anything. This helps keep the go runtime alive for you to see the asynchronous outputs:
Completed! Hello World!
It is worth mentioning that there's Promise implementations available in go, but they aren't part of the standard library. Not to mention, the concurrency model is already incredibly sturdy and robust, so it should cover most people's use cases, letting them expand as needed. It is also of note, that the code is incredibly compact and easy to read.
Conclusion
We have now covered strong concurrency models for four different mainstream languages. Which is your favorite? Have you implemented more traditional concurrency patterns in other languages?
I'd love to follow up with a part 2 covering the Actor model, a very interesting approach to concurrency that languages such as Erlang use, or that you can get on the JVM with the Akka libraries. This is a pretty good base to run on for the time being though. Let me know your thoughts!



















