-
Notifications
You must be signed in to change notification settings - Fork 935
Closed
Milestone
Description
I'm having trouble using MPI_Comm_split_type for in-node splits with custom OMPI_COMM_TYPE_*. Everything works fine when I run with mpirun, but the same code doesn't work with srun. Is this supposed to work, or are there some limitations I'm not aware of? I'm trying with OpenMPI 4.0.3 and 4.0.5, Centos 7.7 with Slurm 19.05, stock hwloc 1.11 (but I also tried to compile OpenMPI with hwloc 2.4.0), pmix 3.1.5.
This is a simple test app:
#include <mpi.h>
#include <stdio.h>
int main(int argc, char **argv)
{
MPI_Comm split_comm;
int split_rank = -1, split_size = -1;
MPI_Init(&argc, &argv);
MPI_Comm_split_type(MPI_COMM_WORLD, OMPI_COMM_TYPE_NUMA, 0, MPI_INFO_NULL, &split_comm);
MPI_Comm_rank(split_comm, &split_rank);
MPI_Comm_size(split_comm, &split_size);
fprintf(stderr, "rank %d >>> split size %d\n", split_rank, split_size);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}
On our EPYC 7742 system I get this with mpirun:
mpirun -np 128 ./splittest
rank 0 >>> split size 16
and this with srun
srun -n 128 ./splittest
rank 0 >>> split size 1
Essentially, I get a split size 1 whatever type I use.