Fork Join Framework in Java7

Java7 introduced interesting framework called fork join. It’s a mapreduce like lightweight parallel programming library.

The basic idea is to split the work into multiple small parts. Process them, and merge the results into final return value.

I’ve written a sample code that computes sum of integers in ArrayList:

import java.math.BigDecimal;
import java.util.ArrayList;
import java.util.Random;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.ForkJoinTask;
import java.util.concurrent.RecursiveTask;

// Use RecursiveTask for task that returns value. Use RecursiveAction, otherwise.
public class SumTask extends RecursiveTask<BigDecimal>{

  private static final long serialVersionUID = -5398793510775248257L;
  private ArrayList<Integer> data;
  private int start;
  private int end;

  public SumTask(ArrayList<Integer> data, int start, int end) {
    super();
    this.data = data;
    this.start = start;
    this.end = end;
  }

  private static boolean isCheapComputation(int start, int end) {
    return (end - start) < 10000;
  }

  @Override
  protected BigDecimal compute() {
    // If the task is small, do it in a single thread.
    if (isCheapComputation(start, end)) {
      BigDecimal result = BigDecimal.ZERO;
      for (int i = start; i < end; ++i) {
        result = result.add(BigDecimal.valueOf(data.get(i)));
      }
      return result;
    }
    // If the task is big, split it into multiple subtasks. They can be subdivided recursively.
    final int mid = start + (end - start) / 2;
    ForkJoinTask<BigDecimal> t1 = new SumTask(data, start, mid);
    ForkJoinTask<BigDecimal> t2 = new SumTask(data, mid + 1, end);
    invokeAll(t1, t2);  // Fork t1 and t2, then wait until both of them finish.
    return t1.join().add(t2.join());
  }

  public static void main(String[] args) {
    Random r = new Random();
    ArrayList<Integer> data = new ArrayList<>();
    for (int i = 0; i < 10000000; i++) {
      data.add(r.nextInt());
    }
    ForkJoinPool pool = new ForkJoinPool();
    SumTask sumTask = new SumTask(data, 0, data.size());
    // Call pool.compute() and wait until completion.
    pool.invoke(sumTask);
    System.out.println(sumTask.join());
  }
}

Fork-Join framework is interesting in some ways:
1) It removed tedious (or sometimes error prone) boilerplate that was necessary for thread join and condition variable. Now, I can just fork and join.
2) Work stealing. As described by A Java Fork/Join Framework, Doug Lea, a thread can steal other thread’s task as a way of smart workload balancing.

Another interesting tutorial is Fork and Join: Java Can Excel at Painless Parallel Programming Too! though it lacks in-depth discussion/comparisons between Java 5/6′s executor service based approach VS Fork/Join framework.

And here’s critiques on F/J framework: A Java™ Fork-Join Calamity. I myself found the library is pretty much limited, say, for IO. (Read API doc carefully before you start using the library. It noted some limitations in the comments.)

Sadly, ParallelArray was not included in Java7. If it had, we could have written like the below for multiprocessor parallel programming:

// Taken from http://www.ibm.com/developerworks/java/library/j-jtp03048/index.html
ParallelArray<Student> students = new ParallelArray<Student>(fjPool, data);
double bestGpa = students.withFilter(isSenior)
                         .withMapping(selectGpa)
                         .max();

Anyway, I think F/J is a good move (as long as it won’t get too complex in the next version of Java). And F/J framework is good enough for no-so-serious parallel tasks.

Similar Posts:

Scrypt – follow up on moderan password hashing algorithm

This is the follow up on the previous article: modern password hashing. In case you didn’t read it, bcrypt is slow hashing algorithm which is not vulnerable to rainbow table as it has built-in salt. Also, as it can be slowed down as much as you want, it can’t be broken even if computers get faster.

Recently, I realized that bcrypt is designed so that it’s difficult to attack it even with GPU. Also, there’s scrypt which has more ram requirement than bcrypt, effectively making it much more difficult to attack with hardware(say, gpu). Scrypt is used by chromium according to the design doc.

Similar Posts:

Introduction to Information Retrieval 무료 ebook

http://nlp.stanford.edu/IR-book/

정말 좋은 무료 ir book. 예를들어 ‘stopword는 색인안할것이다’같은 저의 오해를 깔끔히 깨주었습니다.

Similar Posts:

SWIRL 2012 – Future of IR

http://www.cs.rmit.edu.au/swirl12/discussion.php

IR 관련 방향을 제시하는 중요논문들을 모아놓은 페이지입니다. 참가자들에게 숙제로 3개씩 추천을 받았나보네요. 다른데 볼거없이 이것만 읽어봐도 좋을듯.

Similar Posts:

Online regular expression testing

http://regexpal.com/

Nice online tool for testing regexp. Still, I wish it had the ability to print captured groups.

Similar Posts:

RoR zeroday

https://gist.github.com/1978249

Github를 덮친 ror zeroday attack입니다.RoR쓰시면 체크해보셔야 할듯.

Similar Posts:

NoSQL Data Modeling

http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/

Nosql 데이터 모델링 기법들. 같은 사이트에 mapreduce패턴정리도 올려져 있습니다. 이런쪽 기술 정리하는데 타고나신듯.

Similar Posts:

Pulse architecture

http://eng.pulse.me/scaling-to-10m-on-aws/

Pulse (well known RSS reader) system architecture.

Similar Posts:

Cookie syncing

http://www.adopsinsider.com/ad-exchanges/ssp-to-dsp-cookie-synching-explained/

A way to share cookies between two sites. Simply pass my cookie as a parameter of request to another site. Then the callee can pair that passed cookie with its own cookie.

Similar Posts:

STL for extra large datasets

http://stxxl.sourceforge.net/

요즘은 이런 툴도 나오네요. 결국은 기존 프로그램이 프로그래머의 노력없이 자동으로 병렬화되는 것이 최종 목표겠죠.

Similar Posts: