They gave us a home to find factorial 16! Using MPI. Everything counts well until two MPI_Bcasts start sending messages to all streams.

In theory, the function has the following parameters: MPI_Bcast (& mes, count, datatype, root, comm), where mes is the message for all threads, count is the number of elements in the message (usually 1,> 1 for arrays), datatype is the data type of the message , root - the stream that sends this data, comm - scopes.

I have 2 different MPI_Bcast:

MPI_Bcast(&fact4, 1, MPI_DOUBLE, 1, MPI_COMM_WORLD); 

AND

 MPI_Bcast(&partFact9101112, 1, MPI_DOUBLE, 5, MPI_COMM_WORLD); 

As you can see, messages are taken from different streams, namely from 1 and 5, and send messages to completely different streams that do not overlap with each other. The first sends a message in 0-3 streams, and the second in 4-7 streams.

But there is a problem. Each MPI_Bcast sends 3 messages in total, 2 of which each correctly reach the recipient, but 1 message is confused and sent to the recipient of another MPI_Bcast.

Program output

In the screenshot, 5 are wrong! and 13! because these messages were meant for them. five! should have gotten 4! from 1 thread, and 13! was supposed to get 9 * 10 * 11 * 12 = 11880 from 5 thread. But in the end, the messages got messed up and came to opposite recipients.

As I understand it, the error here is not in the code, but in the specifics of message passing. Indeed, in other threads of the message reached correctly.

  • "The first sends a message to 0-3 streams, and the second to 4-7 streams." - what does that even mean? Do your MPI_Bcast even know that the first one should send a message to 0-3 streams, and the second to 4-7 streams? And if they know, from where? - AnT

1 answer 1

I solved the problem by creating separate scopes for each MPI_Bcast. Those. instead of MPI_COMM_WORLD used its local groups.