Intel® Advisor Help
To force the compiler to vectorize a loop of your choice, you need to add directives - pragmas - into your source code. The pragmas must be inserted before a loop to convey certain information about this loop to the compiler.
Use Intel® Advisor to define the loops that you can vectorize explicitly using the Compiler directives. Intel Advisor suggests you using specific pragmas depending on the type of the issue that your code is experiencing. For example, the Compiler does not vectorize a loop, when it can't vectorize or inline some user-defined function(s) inside the loop. To vectorize the loop, add pragmas according to following table:
Target | ICL/ICC/ICPC Directive | IFORT Directive |
---|---|---|
Source Loop | #pragma simd or #pragma omp simd | !DIR$ SIMD or !$OMP SIMD |
Inner function definition or declaration | #pragma omp declare simd | !$OMP DECLARE SIMD |
To collect data about the application's loops, run the Survey Analysis, and then look at the Vector Issues column in the Survey Report view.
Now consider an example, when the Compiler assumes a dependency in the loop. In this case, see the Compiler Diagnostic Details tab in the Survey report. Pay extra attention to the recommendation provided by the Intel Advisor additionally to the Compiler diagnostics report.
The Compiler found a potential issue in the following code snippet:
for (i__ = 1; i__ <= i__2; ++i__) { k = i__ * (i__ + 1) / 2; i__3 = *n; // Assumed dependency for (j = i__; j <= i__3; ++j) { cdata_1.array[k - 1] += bb[i__ + j * bb_dim1]; k += j; } }
One of the first things you can do in this case is to check if there are any dependencies inside the loop. To check the dependencies, run the Dependencies Analysis.
If the Dependencies Analysis reports no issues that might prevent from vectorization, you can force the Compiler to vectorize via adding directives. For example, you can add pragma that enables vectorization for inner loop:
for (i__ = 1; i__ <= i__2; ++i__) { k = i__ * (i__ + 1) / 2; i__3 = *n; #pragma omp simd for (j = i__; j <= i__3; ++j) { cdata_1.array[k - 1] += bb[i__ + j * bb_dim1]; k += j; } }
Apply the same schema to other parts of your code to force the Compiler to vectorize. Also, consider following recommendations provided in the Compiler Diagnostic Details tab of the Intel Advisor Survey Report to improve your code quality and overall performance of the target application.
See Also