The matrix multiplication white-paper is available from http://www.ateji.com/multicore/whitepapers.html. It shows how we achieved a 12.5x speedup on a 16-core server, simply adding one single "||" operator to an existing sequential Java code. Raw performance is pretty good as well, on par with linear algebra libraries.
Here is the parallel code. Note the "||" operator right after the first 'for' keyword, this is the only difference between sequential and parallel version of the code.
for||(int i : I) {
for(int j : J) {
for(int k : K) {
C[i][j] += A[i][k] * B[k][j];
}
}
}
Performance is pretty impressive, on par with dedicated linear algebra librairies:
The part that I find really interesting is the comparison with the same algorithm using plain Java threads. Even if you have a general knowledge about threads, you need to see actual code before you can imagine the amount of small details that need to be taken into account.
They include adding many final keywords, copying local variables, computing indices, managing InterruptedException. 27 lines vs. 7 lines. And we haven't even returned values or thrown exception from within threads! The problem is not so much verbosity itself, but the fact that programmer's intent gets hidden behind a lot of irrelevant details.
Enjoyed the article?
Share your interest by voting up this article on social sites!
In which Java vesrion/dialect this --for || () { } -- is available ?
ReplyDeletewhy add such a strange structure ? couldnt you just check for the presence of a label "nonsynchronized" on the the loop instead ?
ReplyDeleteie
nonsynchronized:
for (...)
@Anonymous: Ateji Parallel Extensions is implemented by a source-to-source translation to Java6.
ReplyDelete@db: The for|| syntax is actually a very special case of a much larger concept. I'll show the whole set of parallel constructs in future posts, stay tuned.
ReplyDelete