Build and run MPI (Message Passing Interface) enabled codes on ExCL

Hello World built with nvhpc

## mpi_hello_world.c

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    // Print off a hello world message
    printf("Hello world from processor %s, rank %d out of %d processors\n",
           processor_name, world_rank, world_size);

    // Finalize the MPI environment.

Load the Nvidia HPC SDK environment module

$ module load nvhpc-openmpi3

Verify the compiler path

$ which mpicc

Build the program

$ mpicc ./mpi_hello_world.c

Run the program with MPI

$ mpirun -np 4 -mca coll_hcoll_enable 0 ./a.out

[[63377,1],2]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: milan0

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
Hello world from processor, rank 2 out of 4 processors
Hello world from processor, rank 0 out of 4 processors
Hello world from processor, rank 1 out of 4 processors
Hello world from processor, rank 3 out of 4 processors
  • -np 4 specifies that 4 processes will be created, each running a copy of the mpi_hello_world program

  • -mca coll_hcoll_enable 0 disables HCOLL


InfiniBand and HCOLL

ExCL systems typically do not have InfiniBand setup. (Although if this is required, it can be added as needed.) HCOLL (HPC-X: Collective Communication Library) requires an InfiniBand adapter and since it's enabled by default, you could see HCOLL warnings/errors which state that no HCA device can be found. You can disable HCOLL and get rid of these warnings/errors with the -mca coll_hcoll_enable 0 flag for example: mpirun -np 4 -mca coll_hcoll_enable 0 ./a.out.

