Tuesday, March 23, 2010

Matrix multiplication - 12.5x speedup with a single "||"

Agreed, matrix multiplication is not a very sexy piece of code, but it serves as a standard example and benchmark for data-parallel computations (code that applies the same operation to many elements in parallel).

The matrix multiplication white-paper is available from http://www.ateji.com/multicore/whitepapers.html. It shows how we achieved a 12.5x speedup on a 16-core server, simply adding one single "||" operator to an existing sequential Java code. Raw performance is pretty good as well, on par with linear algebra libraries.

Here is the parallel code. Note the "||" operator right after the first 'for' keyword, this is the only difference between sequential and parallel version of the code.


for||(int i : I) {
    for(int j : J) {
        for(int k : K) {
            C[i][j] += A[i][k] * B[k][j];
        }
    }
}



Performance is pretty impressive, on par with dedicated linear algebra librairies:



The part that I find really interesting is the comparison with the same algorithm using plain Java threads. Even if you have a general knowledge about threads, you need to see actual code before you can imagine the amount of small details that need to be taken into account.

They include adding many final keywords, copying local variables, computing indices, managing InterruptedException. 27 lines vs. 7 lines. And we haven't even returned values or thrown exception from within threads! The problem is not so much verbosity itself, but the fact that programmer's intent gets hidden behind a lot of irrelevant details.


Enjoyed the article?
Share your interest by voting up this article on social sites!