I am impressed with intel thread building blocks. I like how i should write task and not thread code and i like how it works under the hood with my limited understanding (task are in a pool, there wont be 100 threads on 4cores, a task is not guaranteed to run because it isnt on its own thread and may be far into the pool. But it may be run with another related task so you cant do bad things like typical thread unsafe code).
I wanted to know more about writing task. I like the ‘Task-based Multithreading – How to Program for 100 cores’ video here http://www.gdcvault.com/sponsor.php?sponsor_id=1 (currently second last link. WARNING it isnt ‘great’). My fav part was ‘solving the maze is better done in parallel’ which is around the 48min mark (you can click the link on the left side. That part is really all you need to watch if any).
However i like to see more code examples and some API of how to write task. Does anyone have a good resource? I have no idea how a class or pieces of code may look after pushing it onto a pool or how weird code may look when you need to make a copy of everything and how much of everything is pushed onto a pool.
Java has a parallel task framework similar to Thread Building Blocks – it’s called the Fork-Join framework. It’s available for use with the current Java SE 6 and to be included in the upcoming Java SE 7.
There are resources available for getting started with the framework, in addition to the javadoc class documentation. From the jsr166 page, mentions that
“There is also a wiki containing additional documentation, notes, advice, examples, and so on for these classes.”
The fork-join examples, such as matrix multiplication are a good place to start.
I used the fork-join framework in solving some of Intel’s 2009 threading challenges. The framework is lightweight and low-overhead – mine was the only Java entry for the Kight’s Tour problem and it outperformed other entries in the competition. The java sources and writeup are available from the challenge site for download.
EDIT:
You can make your own task by subclassing one of the ForKJoinTask subclasses, such as RecursiveTask. Here’s how to compute the fibonacci sequence in parallel. (Taken from the
RecursiveTaskjavadocs – comments are mine.)You then run this task and get the result
This is a trivial example to keep things simple. In practice, performance would not be so good, since the work executed by the task is trivial compared to the overhead of the task framework. As a rule of thumb, a task should perform some significant computation – enough to make the framework overhead insignificant, yet not so much that you end up with one core at the end of the problem running one large task. Splitting large tasks into smaller ones ensures that one core isn’t left doing lots of work while other cores are idle – using smaller tasks keeps more cores busy, but not so small that the task does no real work.
Only the tasks themselves are pushed into a pool. Ideally you don’t want to be copying anything: to avoid interference and the need for locking, which would slow down your program, your tasks should ideally be working with independent data. Read-only data can be shared amongst all tasks, and doesn’t need to be copied. If threads need to co-operate building some large data structure, it’s best they build the pieces separately and then combine them at the end. The combining can be done as a separate task, or each task can add it’s piece of the puzzle to the overall solution. This often does require some form of locking, but it’s not a considerable performance issue if the work of the task is much greater than the the work updating the solution. My Knight’s Tour solution takes this approach to update a common repository of tours on the board.
Working with tasks and concurrency is quite a paradigm shift from regular single-threaded programming. There are often several designs possible to solve a given problem, but only some of these will be suitable for a threaded solution. It can take a few attempts to get the feel for how to recast familiar problems in a multi-threaded way. The best way to learn is to look at the examples, and then try it for yourself. Always profile, and meausre the effects of varying the number of threads. You can explicitly set the number of threads (cores) to use in the pool in the pool constructor. When tasks are broken up linearly, you can expect near linear speedup as the number of threads increases.