Correctness Checker Configuration Options

The table below lists the environment variables that help you configure the MPI correctness checking. Please, look through them to understand their purpose. They all are used in the examples given in this tutorial.

Environment Variable

Value

Description

VT_DEADLOCK_TIMEOUT <delay>

<delay> - time threshold

Default: 1m

Examples:

VT_DEADLOCK_TIMEOUT 1m

VT_DEADLOCK_TIMEOUT 10s

If no progress is observed in any process for this amount of time, Intel Trace Collector stops the application and writes a trace file upon reaching this threshold, assuming that a deadlock has occurred.

TIP

For interactive use, set this variable to a small value like ā€œ10sā€ to detect the deadlocks quickly without having to wait long for the timeout.

VT_DEADLOCK_WARNING <delay>

<delay> - time threshold

Default: 5m

Examples:

VT_DEADLOCK_WARNING 5m

Displays a GLOBAL:DEADLOCK:NO_PROGRESS warning if the time spent by MPI processes in their last MPI call exceeds the specified threshold. This warning indicates a load imbalance or a deadlock that cannot be detected, which may occur when at least one process polls for progress instead of blocking inside an MPI call.

VT_CHECK_TRACING <on | off>

<on | off>

Default: off

When set to on, this variable enables you to record all events including any MPI errors found during the run and to create a trace file.

VT_CHECK_MAX_ERRORS <value>

<value> - maximum errors to detect

Default: 1

Number of errors that has to be reached by a process before aborting the application. 0 disables the limit. Some errors are fatal and always cause an abort. Errors are counted per-process to avoid the need for communication among processes, as that has several drawbacks, which outweigh the advantage of a global counter.