Friday, July 30, 2010

Integrating π (pi) in parallel

A simple way of computing the constant π (pi) consists in measuring the surface under a curve. In more algebraic terms, this amounts to integrating y = 4/(1+x*x) between 0 and 1, and in programming terms this means incrementing x in small steps and summing the corresponding y (the smaller the steps, the more accurate the result).

Tim Mattson in his blog entry "Writing Parallel Programs: a multi-language tutorial introduction" explores available tools for coding this algorithm in parallel, namely OpenMP, MPI and Java threads.

Here we will stick to the Java universe, and compare Java sequential and multi-threaded code with their Ateji PX equivalent. Impatient readers may readily jump to the Ateji PX version at the end of the article.

The sequential Java code, inspired from Tim's sequential C version, is as follows:


  static final int numSteps = 100000; 
  static final double step = 1.0/numSteps; ; 

  public static void main(String[] args) {
    double sum = 0.0; 
    for(int i=0; i<= numSteps; i++) { 
      double x = (i+0.5)*step; 
      sum = sum + 4.0/(1.0+x*x); 
    } 
    double pi = step * sum;
    System.out.println(pi);
    System.out.println(Math.PI); 
  }



Try to play with the value of numSteps and see the effect on precision.

Tim parallelizes this code using threads as follows (slightly edited to make it look more Java-ish) :


static int nProcs = Runtime.getRuntime().availableProcessors(); 

static class PIThread extends Thread 
{ 
  final int partNumber; 
  double sum = 0.0; 

  public PIThread(int partNumber) { 
    this.partNumber = partNumber; 
  } 

  public void run() { 
    for (int i = partNumber; i < numSteps; i += nProcs) { 
      double x = (i + 0.5) * step; 
      sum += 4.0 / (1.0 + x * x); 
    } 
  } 
} 

public static void main(String[] args) { 
  PIThread[] part_sums = new PIThread[nProcs]; 
  for(int i = 0; i < nProcs; i++) { 
    (part_sums[i] = new PIThread(i)).start(); 
  } 
  double sum = 0.0; 
  for(int i = 0; i < nProcs; i++) { 
    try { 
      part_sums[i].join(); 
    } catch (InterruptedException e) {
    } 
    sum += part_sums[i].sum; 
  } 
  double pi = step * sum; 
  System.out.println(pi); 
} 



Pretty verbose, isn't it ? The core of the algorithm becomes hidden behind a lot of irrelevant details.

Being verbose also means that it becomes just too easy to overlook potential problems. In this code, the handling of InterruptedException is wrong and may lead to very nasty bugs when put in the context of a larger application. Not to blame Tim: honestly, who understands the precise meaning and usage rules of InterruptedException ?

In contrast, let us code the integration of π using Ateji PX, an extension of Java. First of all, the mathematical expression used in the integration is a typical example of a comprehension, for which Ateji PX provides an intuitive syntax. Here is the sequential code:


public static void main(String[] args) {
  double sum = `+ for{ 4.0/(1.0+x*x) | int i : numSteps, double x = (i+0.5)*step }
  double pi = step * sum;
  System.out.println(pi);
}



The second line, computing sum, is very close to the standard big-sigma notation in mathematics. Having this notation available as an extension of Java makes the expression of many mathematical formulas concise and intuitive, almost like what you've learned in high school.

It also makes the code closer to the programmer's intent. In the first sequential version, using a for loop, it takes some thinking before realizing that the code is actually computing a sum. This has a strong impact on code readability and maintenance.

But what's really interesting is how this code can be parallelized. Simply add a parallel bar ("||") right after the for keyword, and Ateji PX will perform the computation in parallel using all available cores.


public static void main(String[] args) {
  double sum = `+ for||{ 4.0/(1.0+x*x) | int i : numSteps, double x = (i+0.5)*step }
  double pi = step * sum;
  System.out.println(pi);
}



In the OpenMP community, this is called a parallel reduction. Compare this code to the OpenMP version and the multi-threaded version.

Comprehension expressions in Ateji PX are not limited to summation. They can express aggregate operations such as product, logical or, count and average, but also bulk data manipulation such as SQL-like queries and list or set comprehensions (the set of all ... such that ...), and even operate on user-defined operations.

Monday, July 19, 2010

Ateji PX gems : Non-local exits in parallel branches

This article is the first of a series that will explore all the lesser known gems of Ateji PX.

Non-local exits are all the statements that take the flow of control "away" from the current local scope. In Java, they are


  • return,
  • throw,
  • break,
  • continue

Other than Ateji PX, I do not know of any parallel programming framework that properly handles the combination of parallelism and non-local exits.

Which is a pity, because this combination proves very useful. For one, it makes it possible to parallelize existing code without having to rewrite all the control flow, a long and error-prone operation making code difficult to read.

It also makes it possible to design interesting algorithms specifically making use of this combination.

A good example is speculative parallelism. "speculative" in this context means starting work before being totally sure that it is needed. This is a way to make the best use of idle cores.

You can use speculative parallelism to put different algorithms in competition, and take the result from the first one that terminates. Here is a try at speculatively sorting an array in Ateji PX:


    [
      || return bubbleSortAlgorithm(array);
      || return insertionSortAlgorithm(array); 
    ]

This code runs the two algorithms in parallel, take the result from the first algorithm that terminates and return it as the global result, stopping all remaining branches.

Here the interesting work is done by the return statements enclosed within a parallel block. As expected, a return statement returns from the enclosing method, stopping all existing branches as necessary. Without the return statements, the program would wait until both branches have terminated.

Properly stopping other threads/tasks/processes is one of the trickiest part of parallel programming. If you've ever tried it, you know what I'm talking about. With Ateji PX, you simply add a return (or break, or continue) statement inside a branch.

Thursday, July 15, 2010

Dr Dobbs

I have been for years an enthousiastic reader of Dr. Dobbs Journal, a software magazine where programmers talk to programmers. DDJ has always been for me a reference, providing accurate and timely information with a practitioner's point of view.

I learned a lot from DDJ about technology itself and the way to apply it. I used to subscribe to the paper edition (probably sent via sail-mail, since it always took about 3 months to reach Paris...), before it became web-only.

This is why I take special pride in having made the headline of Dr. Dobbs Update with Ateji PX in an article by Jon Erickson, DDJ's editor in chief, titled "Think Parallel, Think Java".

Monday, July 5, 2010

Easy multi-core programming for all: Ateji PX is launched

I am proud to announce the first public release of Ateji PX, our new language extension and programming environment for multi-core and parallel programming in Java.

Ateji PX went through a one year technology preview program with selected customers, that confirmed our initial claims about ease of use, ease of learning, and compatibility wih existing code and tools.

You can now download the public release and discover the powerful language constructs offered by Ateji PX. Read the documentation and play with the samples for a quick introduction.

Here is the official press release.

Enjoy!