What is the difference in OpenMP between :
#pragma omp parallel sections
{
#pragma omp section
{
fct1();
}
#pragma omp section
{
fct2();
}
}
and :
#pragma omp parallel
{
#pragma omp single
{
#pragma omp task
fct1();
#pragma omp task
fct2();
}
}
I’m not sure that the second code is correct…
The difference between tasks and sections is in the time frame in which the code will execute. Sections are enclosed within the
sectionsconstruct and (unless thenowaitclause was specified) threads will not leave it until all sections have been executed:Here
Nthreads encounter asectionsconstruct with two sections, the second taking more time than the first. The first two threads execute one section each. The otherN-2threads simply wait at the implicit barrier at the end of the sections construct (show here as*).Tasks are queued and executed whenever possible at the so-called task scheduling points. Under some conditions, the runtime could be allowed to move task between threads, even in the mid of their lifetime. Such tasks are called untied and an untied task might start executing in one thread, then at some scheduling point it might be migrated by the runtime to another thread.
Still, tasks and sections are in many ways similar. For example, the following two code fragments achieve essentially the same result:
taskwaitworks very likebarrierbut for tasks – it ensures that current execution flow will get paused until all queued tasks have been executed. It is a scheduling point, i.e. it allows threads to process tasks. Thesingleconstruct is needed so that tasks will be created by one thread only. If there was nosingleconstruct, each task would get creatednum_threadstimes, which might not be what one wants. Thenowaitclause in thesingleconstruct instructs the other threads to not wait until thesingleconstruct was executed (i.e. removes the implicit barrier at the end of thesingleconstruct). So they hit thetaskwaitimmediately and start processing tasks.taskwaitis an explicit scheduling point shown here for clarity. There are also implicit scheduling points, most notably inside the barrier synchronisation, no matter if explicit or implicit. Therefore, the above code could also be written simply as:Here is one possible scenario of what might happen if there are three threads:
Show here within the
| ... |is the action of the scheduling point (either thetaskwaitdirective or the implicit barrier). Basically thread1and2suspend what they are doing at that point and start processing tasks from the queue. Once all tasks have been processed, threads resume their normal execution flow. Note that threads1and2might reach the scheduling point before thread0has exited thesingleconstruct, so the left|s need not necessary be aligned (this is represented on the diagram above).It might also happen that thread
1is able to finish processing thefoo()task and request another one even before the other threads are able to request tasks. So bothfoo()andbar()might get executed by the same thread:It is also possible that the singled out thread might execute the second task if thread 2 comes too late:
In some cases the compiler or the OpenMP runtime might even bypass the task queue completely and execute the tasks serially:
If no task scheduling points are present inside the region’s code, the OpenMP runtime might start the tasks whenever it deems appropriate. For example it is possible that all tasks are deferred until the barrier at the end of the
parallelregion is reached.