Skip to main content

Unit 4.2.3 Hello World!

We will illustrate some of the basics of OpenMP via the old standby, the "Hello World!" program:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
  printf( "Hello World!\n" );
}

Homework 4.2.3.1.

In Week4/C/ compile HelloWorld.c with the command

gcc -o HelloWorld.x HelloWorld.c

and execute the resulting executable with

export OMP_NUM_THREADS=4 
./HelloWorld.x

Solution

The output is

Hello World!

even though we indicated four threads are available for the execution of the program.

Homework 4.2.3.2.

Copy the file HelloWorld.c to HelloWorld1.c. Modify it to add the OpenMP header file:

#include "omp.h"

at the top of the file. Compile it with the command

gcc -o HelloWorld1.x HelloWorld1.c

and execute it with

export OMP_NUM_THREADS=4 
     ./HelloWorld1.x

Next, recompile and execute with

gcc -o HelloWorld1.x HelloWorld1.c -fopenmp 
export OMP_NUM_THREADS=4 
./HelloWorld1.x

Pay attention to the -fopenmp, which links the OpenMP library. What do you notice?

gcc -o HelloWorld1.x HelloWorld1.c -fopenmp 
export OMP_NUM_THREADS=4 
./HelloWorld1.x

(You don't need to export OMP_NUM_THREADS=4 every time you execute. We expose it so that you know exactly how many threads are available.)

Solution

In all cases, the output is

Hello World!

None of what you have tried so far resulted in any parallel execution because an OpenMP program uses a "fork and join" model: Initially, there is just one thread of execution. Multiple threads are deployed when the program reaches a parallel region, initiated by

#pragma omp parallel 
{
  <command>
}

At that point, multiple threads are "forked" (initiated), each of which then performs the command <command> given in the parallel section. The parallel section here is the section of the code immediately after the #pragma directive bracketed by the "{" and "}" which C views as a single command (that may be composed of multiple commands within that region). Notice that the "{" and "}" are not necessary if the parallel region consists of a single command. At the end of the region, the threads are synchronized and "join" back into a single thread.

Homework 4.2.3.3.

Copy the file HelloWorld1.c to HelloWorld2.c. Before the printf statement, insert

#pragma omp parallel

Compile and execute:

gcc -o HelloWorld2.x HelloWorld2.c -fopenmp
export OMP_NUM_THREADS=4
./HelloWorld2.x

What do you notice?

Solution

The output now should be

Hello World! 
Hello World! 
Hello World! 
Hello World!

You are now running identical programs on four threads, each of which is printing out an identical message. Execution starts with a single thread that forks four threads, each of which printed a copy of the message. Obviously, this isn't very interesting since they don't collaborate to make computation that was previously performed by one thread faster.

Next, we introduce three routines with which we can extract information about the environment in which the program executes and information about a specific thread of execution:

  • omp_get_max_threads() returns the maximum number of threads that are available for computation. It equals the number assigned to OMP_NUM_THREADS before executing a program.

  • omp_get_num_threads() equals the number of threads in the current team: The total number of threads that are available may be broken up into teams that perform separate tasks.

  • omp_get_thread_num() returns the index that uniquely identifies the thread that calls this function, among the threads in the current team. This index ranges from 0 to (omp_get_num_threads()-1). In other words, the indexing of the threads starts at zero.

In all our examples, omp_get_num_threads() equals omp_get_max_threads() in a parallel section.

Homework 4.2.3.4.

Copy the file HelloWorld2.c to HelloWorld3.c. Modify the body of the main routine to

int maxthreads = omp_get_max_threads();

#pragma omp parallel
{
  int nthreads = omp_get_num_threads(); 
  int tid = omp_get_thread_num(); 

  printf( "Hello World! from %d of %d max_threads = %d \n\n",
                                               tid, nthreads, maxthreads );
}

Compile it and execute it:

export OMP_NUM_THREADS=4  
gcc -o HelloWorld3.x HelloWorld3.c -fopenmp
./HelloWorld3.x

What do you notice?

Solution

The output:

Hello World! from 0 of 4 max_threads = 4 

Hello World! from 1 of 4 max_threads = 4 

Hello World! from 3 of 4 max_threads = 4 

Hello World! from 2 of 4 max_threads = 4

In the last exercise, there are four threads available for execution (since OMP_NUM_THREADS equals \(4\)). In the parallel section, each thread assigns its index (rank) to its private variable tid. Each thread then prints out its copy of Hello World! Notice that the order of these printfs may not be in sequential order. This very simple example demonstrates how the work performed by a specific thread is determined by its index within a team.