Saturday, March 27, 2010

Ateji PX vs Java's ParallelArray

The ParallelArray API is a proposal for integrating data parallelism into the Java language. Here is an overview by Brian Goetz: http://www.ibm.com/developerworks/java/library/j-jtp11137.html

The canonical example for ParallelArray shows how to compute an average of student grades in parallel. Note the declaration of students as a ParallelArray rather than an array, the introduction of a filter 'isSenior' and a selector 'selectGpa', and the use of an ExecutorService 'fjPool'.


ParallelArray students = new ParallelArray(fjPool, data);
double bestGpa = students.withFilter(isSenior)
                         .withMapping(selectGpa)
                         .max();


static final Ops.Predicate isSenior = new Ops.Predicate() {
    public boolean op(Student s) {
        return s.graduationYear == Student.THIS_YEAR;
    }
};


static final Ops.ObjectToDouble selectGpa = new Ops.ObjectToDouble() {
    public double op(Student student) {
        return student.gpa;
    }
};



Closures may help in making this code less verbose. Here is an example based on the BGGA proposal:


ParallelArray students = new ParallelArray(fjPool, data);
double bestGpa = students.withFilter({Student s => (s.graduationYear == THIS_YEAR) })
                         .withMapping({ Student s => s.gpa })
                         .max();



However, what we're interested in with this use case is expressing so-called "reduction operations" over arrays using generators, filters and selectors, and run them in parallel. Making Java a functional language is an interesting but different topic.

Here is the syntax used by Ateji Parallel Extensions for expressing reduction operations (we actually call them "comprehensions", as in "list comprehension"). This syntax should look familiar, as it inspired by the set notation used in high school mathematics.


Student[] students = ...;
double bestGpa = `+ for{ s.gpa | Student s : students, if s.graduationYear == THIS_YEAR };



Note that students remains a classical Java array. This code is sequential, it becomes parallel when a parallel operator is inserted right after the for keyword:


Student[] students = ...;
double bestGpa = `+ for||{ s.gpa | Student s : students, if s.graduationYear == THIS_YEAR };



The parallel operator on reduction expressions provides almost linear speedup on multi-core processors, as soon as the amount of computation for each array element is more than a couple of additions.

Quiz:
Using the comprehension notation, how would you express the number of students graduating this year ? The average of their grades ?

2 comments:

Jan Van Besien said...

Interesting approach! Do you use Fork/Join under the hood, or have you developed your own libraries to support your language extensions?

Patrick Viry said...

Most of the work is done at compile-time. We also have our own library for supporting the translated code, but this remains invisible to the programmer. It is quite different from fork/join anyway.