Saturday, December 4, 2010

Why multithreading is difficult

It is common wisdom that programming with Java threads is difficult (and even more so in other languages). But have you ever tried to figure precisely why this is so ?

The problem with threads (PDF) is a nicely written research report that tries to answer this question. In short, a thread is a notion that makes sense at the hardware level, when you think in terms of registers and instruction pointers. But this hardware level notion should never have made its way up to the source code level. Just like you wouldn't mix Java source code and assembly language instructions.

Let us try to go deeper: why exactly is the notion of thread ill-suited as a programming abstraction ? The core of the problem can be expressed in one single word: composability.

Consider for example arithmetic expressions. You can take two numbers and compose them to form an addition:


The result of the addition is itself a number, so you can compose it again with yet another number:
(1+2)+3 or 3+(1+2)

When there's no ambiguity, you can remove the parentheses altogether:

In other words, arithmetic expressions can be composed using + as a composition operator (there are many other).

The exact same process works for Java statements using sequential composition:

{ a; b; }
{ a; b; }; c;
c; { a; b; }
a; b; c;

What about threads ? Can you take two threads and combine them in order to obtain a thread ? So that it can itself be combined with yet another thread ? No, threads do not compose.

So why is composability important ?

Composability provides a method for building larger pieces out of smaller ones: just compose. This is how you can build large programs by building up on smaller ones.

You want to make sure that the program behaves as expected ? First start by making sure that each of its pieces behaves as expected. This is what unit testing is about.

Composability also provides a method for understanding what is going on: just decompose. If a divide-by-zero exception is thrown by { a; b; }, then it must have been thrown either from a or from b. Decomposition is our standard and intuitive way to debug a program.

Now what about a multithreaded program ? Imagine a program with two threads that throws an exception. Can you split the program in two pieces such that the exception comes from either one or the other ? No way !

This is precisely why multithreading is difficult. Multithreaded programs are impossible to test and debug, and making sure that they work properly requires a thorough analysis of the whole code, including pieces you didn't even know they existed.

Parallel programming itself does not need to be difficult, the problem is that mainstream programming languages do not provide a parallel composition operator as they do for sequential composition. Ateji's technological offering consists precisely in retrofitting languages with such an operator.

Let me stress this once again: multithreaded programming is difficult because of the lack of composability. Not because because of parallelism, non-determinism, or the other usual suspects.

This is true of Java threads, but also of all structures built upon threads without providing their own composition mechanism, such as tasks (the Java Executor framework) or concurrent collections (the Java concurrency framework).