Tutorial: Detecting and Removing Unnecessary Serialization for IntelĀ® Trace Analyzer and Collector
Analyze
MPI process activity in your application.
To see the particular MPI functions called in the application, right-click on MPI in the Event Timeline and choose Ungroup Group MPI. This operation exposes the individual MPI calls.
After ungrouping the MPI functions, you see that the processes communicate with their direct neighbors using MPI_Sendrecv at the start of the iteration.
This data exchange has a disadvantage: process i does not exchange data with its neighbor i+1 until the exchange between i-1 and i is complete. This delay appears as a staircase resulting with the processes waiting for each other.
The MPI_Allreduce at the end of the iteration resynchronizes all processes; that is why this block has the reverse staircase appearance.