Intel® Trace Collector 9.1 Update 2 User and Reference Guide
Intel® Trace Collector provides the correctness checking functionality, which addresses two different concerns:
Finding programming mistakes in the application. It includes potential portability problems and violations of the MPI standard which do not immediately cause problems, but might when switching to different hardware or a different MPI implementation. In this case correctness checking is most likely done interactively on a smaller development cluster, but it might also be included in automated regression testing.
Detecting errors in the execution environment. This case should use the hardware and software stack on the system that is to be checked.
While doing correctness checking, you should distinguish error detection which is done automatically by tools, and error analysis which is done by the user to determine the root cause of an error and eventually fix it.
The error detection in Intel® Trace Collector is implemented in the libVTmc library, which performs error detection at runtime. To cover both of the scenarios above, recording of error reports for later analysis, and interactive debugging at runtime are supported.
The errors are printed to stderr as soon as they are found. Interactive debugging is done with the help of a traditional debugger: if the application is already running under debugger control, then the debugger has the possibility to stop a process when an error is found. It is necessary to manually set a breakpoint in the function MessageCheckingBreakpoint(). This function and debug information about it are contained in the Intel® Trace Collector library. Therefore it is possible to set the breakpoint and after a process was stopped, to inspect the parameters of the function which describe which error has occurred.
See the following topics on the usage of correctness checking: