Fork Join Framework in Java7

Tags:

Java7 introduced interesting framework called fork join. It’s a mapreduce like lightweight parallel programming library.

The basic idea is to split the work into multiple small parts. Process them, and merge the results into final return value.

I’ve written a sample code that computes sum of integers in ArrayList:

import java.math.BigDecimal;
import java.util.ArrayList;
import java.util.Random;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.ForkJoinTask;
import java.util.concurrent.RecursiveTask;


// Use RecursiveTask for task that returns value. Use RecursiveAction, otherwise.
public class SumTask extends RecursiveTask<BigDecimal>{
  
  private static final long serialVersionUID = -5398793510775248257L;
  private ArrayList<Integer> data;
  private int start;
  private int end;

  public SumTask(ArrayList<Integer> data, int start, int end) {
    super();
    this.data = data;
    this.start = start;
    this.end = end;
  }
  
  private static boolean isCheapComputation(int start, int end) {
    return (end - start) < 10000;
  }
  
  @Override
  protected BigDecimal compute() {
    // If the task is small, do it in a single thread.
    if (isCheapComputation(start, end)) {
      BigDecimal result = BigDecimal.ZERO;
      for (int i = start; i < end; ++i) {
        result = result.add(BigDecimal.valueOf(data.get(i)));
      }
      return result;
    }
    // If the task is big, split it into multiple subtasks. They can be subdivided recursively.
    final int mid = start + (end - start) / 2;
    ForkJoinTask<BigDecimal> t1 = new SumTask(data, start, mid);
    ForkJoinTask<BigDecimal> t2 = new SumTask(data, mid + 1, end);
    invokeAll(t1, t2);  // Fork t1 and t2, then wait until both of them finish.
    return t1.join().add(t2.join());
  }
  
  public static void main(String[] args) {
    Random r = new Random();
    ArrayList<Integer> data = new ArrayList<>();
    for (int i = 0; i < 10000000; i++) {
      data.add(r.nextInt());
    }
    ForkJoinPool pool = new ForkJoinPool();
    SumTask sumTask = new SumTask(data, 0, data.size());
    // Call pool.compute() and wait until completion.
    pool.invoke(sumTask);
    System.out.println(sumTask.join());
  }
}

Fork-Join framework is interesting in some ways:
1) It removed tedious (or sometimes error prone) boilerplate that was necessary for thread join and condition variable. Now, I can just fork and join.
2) Work stealing. As described by A Java Fork/Join Framework, Doug Lea, a thread can steal other thread’s task as a way of smart workload balancing.

Another interesting tutorial is Fork and Join: Java Can Excel at Painless Parallel Programming Too! though it lacks in-depth discussion/comparisons between Java 5/6’s executor service based approach VS Fork/Join framework.

And here’s critiques on F/J framework: A Java™ Fork-Join Calamity. I myself found the library is pretty much limited, say, for IO. (Read API doc carefully before you start using the library. It noted some limitations in the comments.)

Sadly, ParallelArray was not included in Java7. If it had, we could have written like the below for multiprocessor parallel programming:

// Taken from http://www.ibm.com/developerworks/java/library/j-jtp03048/index.html
ParallelArray<Student> students = new ParallelArray<Student>(fjPool, data);
double bestGpa = students.withFilter(isSenior)
                         .withMapping(selectGpa)
                         .max();

Anyway, I think F/J is a good move (as long as it won’t get too complex in the next version of Java). And F/J framework is good enough for no-so-serious parallel tasks.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *