OpenMP Directives

1. Directives Format

1.1. Fortran Directives Format

Format: (not case sensitive)

sentinel directive-name [clause ...]

All Fortran OpenMP directives must begin with a sentinel. The accepted sentinels depend on the type of Fortran source. Possible sentinels are:

!$OMP
C$OMP
*$OMP

Example:

!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(BETA,PI)

Fixed Form Source:

Free Form Source:

General Rules:

!$OMP directive
    [ structured block of code ]
!$OMP end directive

1.2. C / C++ Directives Format

Format:

#pragma omp directive-name [clause, ...] newline

A valid OpenMP directive must appear after the pragma and before any clauses. Clauses can be placed in any order, and repeated as necessary, unless otherwise restricted. It is required that that the pragma clause precedes the structured block which is enclosed by this directive.

Example:

#pragma omp parallel default(shared) private(beta,pi)

General Rules:

* Case sensitive

* Directives follow conventions of the C/C++ standards for compiler directives.

* Only one directive-name may be specified per directive.

* Each directive applies to at most one succeeding statement, which must be a structured block.

* Long directive lines can be "continued" on succeeding lines by escaping the newline character with a backslash ("n") at the end of a directive line.

PARALLEL Region Construct

Purpose: A parallel region is a block of code that will be executed by multiple threads. This is the fundamental OpenMP parallel construct.

Example:

Fortran

!$OMP PARALLEL [clause ...]

IF (scalar_logical_expression)

PRIVATE (list)

SHARED (list)

DEFAULT (PRIVATE | FIRSTPRIVATE | SHARED | NONE)

FIRSTPRIVATE (list)

REDUCTION (operator: list)

COPYIN (list)

NUM_THREADS (scalar-integer-expression)

block

!$OMP END PARALLEL

C/C++

#pragma omp parallel [clause ...] newline

if (scalar_expression)

private (list)

shared (list)

default (shared | none)

firstprivate (list)

reduction (operator: list)

copyin (list)

num_threads (integer-expression)

structured_block

Notes:

- When a thread reaches a PARALLEL directive, it creates a team of threads and becomes the master of the team. The master is a member of that team and has thread number 0 within that team.

- Starting from the beginning of this parallel region, the code is duplicated and all threads will execute that code.

- There is an implicit barrier at the end of a parallel section. Only the master thread continues execution past this point.

- If any thread terminates within a parallel region, all threads in the team will terminate, and the work done up until that point is undefined.

How Many Threads?

The number of threads in a parallel region is determined by the following factors, in order of precedence:

1. Evaluation of the IF clause

2. Setting of the NUM THREADS clause

3. Use of the omp set num threads() library function

4. Setting of the OMP NUM THREADS environment variable

5. Implementation default - usually the number of CPUs on a node, though it could be dynamic.

Threads are numbered from 0 (master thread) to N-1.

Dynamic Threads:

Use the omp get dynamic() library function to determine if dynamic threads are enabled. If supported, the two methods available for enabling dynamic threads are:

1. The omp set dynamic() library routine;

2. Setting of the OMP DYNAMIC environment variable to TRUE.

Nested Parallel Regions:

Use the omp get nested() library function to determine if nested parallel regions are enabled. The two methods available for enabling nested parallel regions (if supported) are:

1. The omp set nested() library routine

2. Setting of the OMP NESTED environment variable to TRUE

If not supported, a parallel region nested within another parallel region results in the creation of a new team, consisting of one thread, by default.

Clauses:

IF clause: If present, it must evaluate to .TRUE. (Fortran) or non-zero (C/C++) in order for a team of threads to be created. Otherwise, the region is executed serially by the master thread.

Restrictions:

A parallel region must be a structured block that does not span multiple routines or code files. It is illegal to branch into or out of a parallel region. Only a single IF clause is permitted. Only a single NUM THREADS clause is permitted.

Example: Parallel Region - Simple "Hello World" program

- Every thread executes all code enclosed in the parallel section

- OpenMP library routines are used to obtain thread identifiers and total number of threads

Fortran - Parallel Region Example

INTEGER NTHREADS, TID, OMP_GET_NUM_THREADS,

+ OMP_GET_THREAD_NUM

C Fork a team of threads with each thread having a private TID variable

!$OMP PARALLEL PRIVATE(TID)

C Obtain and print thread id

TID = OMP_GET_THREAD_NUM()

PRINT *, 'Hello World from thread = ', TID

C Only master thread does this

IF (TID .EQ. 0) THEN

NTHREADS = OMP_GET_NUM_THREADS()

PRINT *, 'Number of threads = ', NTHREADS

END IF

C All threads join master thread and disband

!$OMP END PARALLEL

END

C / C++ - Parallel Region Example

#include

main () {

int nthreads, tid;

/* Fork a team of threads with each thread having a private tid variable */

#pragma omp parallel private(tid)

{

/* Obtain and print thread id */

tid = omp_get_thread_num();

printf("Hello World from thread = %d\n", tid);

/* Only master thread does this */

if (tid == 0)

{

nthreads = omp_get_num_threads();

printf("Number of threads = %d\n", nthreads);

}

} /* All threads join master thread and terminate */

}

General rules of directives (for more details about these directives you can go to openMP Directives ):

- They follow the standards and conventions of the C/C++ or Fortran compilers;

- They are case sensitive;

- In a directive, only one name can me specified;

- Any directive can be applied only to the statement following it, which must be a structured block.

- "Long" directives can be continued on the next lines by adding a n at the end of the first line of the directive.

 

2. The OpenMP Directives

PARALLEL region: a block will be executed in parallel by OMP NUM THREADS number of threads. It is the fundamental construction in OpenMP.

Work-sharing structures:

DO/for - shares an iteration of a cycle over all threads (parallel data);
SECTIONS - splits the task in separated sections (functional parallel processing);
SINGLE - serialises a code section.

Synchronizing constructions:

MASTER - only the master thread will execute the region of code;
CRITICAL - that region of code will be executed only by one thread;
BARRIER - all threads from the pool synchronize;
ATOMIC - a certain region of memory will be updated in an atomic mode - a sort of critical section;
FLUSH - identifies a syncronization point in which the memory must be in a consistent mode;
ORDERED - the iterations of the cycle from this directive will be executed in the same order like the corresponding serial execution;
THREADPRIVATE - it is used to create from the global variables, local separated variables which will be executed on several parallel regions.

Clauses to set the context:

These are important for programming in a programming model with shared memory. They are used together with the PARALLEL, DO/for and SECTIONS directives.

PRIVATE - the variables from the list are private in every thread;
SHARED - the variables from the list are shared by the threads of the current team;
DEFAULT - it allows the user to set the default "PRIVATE", "SHARED" or "NONE" for all the variables from a parallel region;
FIRSTPRIVATE - it combines the functionality of the clause PRIVATE with the automated initialization of the variables from the list: the initialisation of the local variables is made using the previous value from the cycle;
LASTPRIVATE - it combines the functionality of the PRIVATE clause with a copy of the last iteration from the current section;
COPYIN - it offers the possibility to assign the same value to the variables THREADPRIVATE for all the threads in the pool;
REDUCTION - it makes a reduction on the variables that appear in the list (with a specific operation: + - * /,etc.).