Wednesday, March 31, 2010

Robin Milner

Professor Robin Milner, one of the pioneers of computer science, passed away on March 20: http://www.timesonline.co.uk/tol/comment/obituaries/article7081867.ece.

He will be remembered, among many other achievements, as the inventor of pi-calculus, the theoretical foundation behind Ateji Parallel Extensions.

Though I never had a chance to actually meet him, he inspired my career all along, as a researcher and as an entrepreneur.

Saturday, March 27, 2010

Ateji PX vs Java's ParallelArray

The ParallelArray API is a proposal for integrating data parallelism into the Java language. Here is an overview by Brian Goetz: http://www.ibm.com/developerworks/java/library/j-jtp11137.html

The canonical example for ParallelArray shows how to compute an average of student grades in parallel. Note the declaration of students as a ParallelArray rather than an array, the introduction of a filter 'isSenior' and a selector 'selectGpa', and the use of an ExecutorService 'fjPool'.


ParallelArray students = new ParallelArray(fjPool, data);
double bestGpa = students.withFilter(isSenior)
                         .withMapping(selectGpa)
                         .max();


static final Ops.Predicate isSenior = new Ops.Predicate() {
    public boolean op(Student s) {
        return s.graduationYear == Student.THIS_YEAR;
    }
};


static final Ops.ObjectToDouble selectGpa = new Ops.ObjectToDouble() {
    public double op(Student student) {
        return student.gpa;
    }
};



Closures may help in making this code less verbose. Here is an example based on the BGGA proposal:


ParallelArray students = new ParallelArray(fjPool, data);
double bestGpa = students.withFilter({Student s => (s.graduationYear == THIS_YEAR) })
                         .withMapping({ Student s => s.gpa })
                         .max();



However, what we're interested in with this use case is expressing so-called "reduction operations" over arrays using generators, filters and selectors, and run them in parallel. Making Java a functional language is an interesting but different topic.

Here is the syntax used by Ateji Parallel Extensions for expressing reduction operations (we actually call them "comprehensions", as in "list comprehension"). This syntax should look familiar, as it inspired by the set notation used in high school mathematics.


Student[] students = ...;
double bestGpa = `+ for{ s.gpa | Student s : students, if s.graduationYear == THIS_YEAR };



Note that students remains a classical Java array. This code is sequential, it becomes parallel when a parallel operator is inserted right after the for keyword:


Student[] students = ...;
double bestGpa = `+ for||{ s.gpa | Student s : students, if s.graduationYear == THIS_YEAR };



The parallel operator on reduction expressions provides almost linear speedup on multi-core processors, as soon as the amount of computation for each array element is more than a couple of additions.

Quiz:
Using the comprehension notation, how would you express the number of students graduating this year ? The average of their grades ?

Tuesday, March 23, 2010

Matrix multiplication - 12.5x speedup with a single "||"

Agreed, matrix multiplication is not a very sexy piece of code, but it serves as a standard example and benchmark for data-parallel computations (code that applies the same operation to many elements in parallel).

The matrix multiplication white-paper is available from http://www.ateji.com/multicore/whitepapers.html. It shows how we achieved a 12.5x speedup on a 16-core server, simply adding one single "||" operator to an existing sequential Java code. Raw performance is pretty good as well, on par with linear algebra libraries.

Here is the parallel code. Note the "||" operator right after the first 'for' keyword, this is the only difference between sequential and parallel version of the code.


for||(int i : I) {
    for(int j : J) {
        for(int k : K) {
            C[i][j] += A[i][k] * B[k][j];
        }
    }
}



Performance is pretty impressive, on par with dedicated linear algebra librairies:



The part that I find really interesting is the comparison with the same algorithm using plain Java threads. Even if you have a general knowledge about threads, you need to see actual code before you can imagine the amount of small details that need to be taken into account.

They include adding many final keywords, copying local variables, computing indices, managing InterruptedException. 27 lines vs. 7 lines. And we haven't even returned values or thrown exception from within threads! The problem is not so much verbosity itself, but the fact that programmer's intent gets hidden behind a lot of irrelevant details.


Enjoyed the article?
Share your interest by voting up this article on social sites!