Intel® Advisor Help
For the Intel® C/C++ and Fortran compilers adding SIMD parallelism (also known as loop vectorization) means unrolling the loop so that it takes advantage of packed SIMD instructions to perform the same operation on multiple data elements with a single instruction, so that the loop can execute more efficiently.
Using the vec option enables vectorization at default optimization levels for both Intel® microprocessors and non-Intel microprocessors. Vectorization may call library routines that can result in additional performance gain on Intel microprocessors than on non-Intel microprocessors. The vectorization can also be affected by certain options, such as m or x .
You might interpret vectorization as execution of more than one consecutive iteration of the original loop at the same time. For processors supporting Streaming SIMD Extensions, this is usually more than 2 iterations. This leads to some restrictions on the types of loop that can be vectorized:
The loop must contain straight-line code (a single basic block). There should be no jumps or branches, but masked assignments are allowed, including if-then-else constructs that can be interpreted as masked assignments.
The loop must be countable, which means the number of loop iterations must be available before the loop execution begins.
There should be no backward loop-carried dependencies.
Vectorization is enabled with the Intel® compiler at optimization levels of O2 and higher. Many loops are vectorized automatically, but in cases where this doesn't happen, you may be able to vectorize loops by making simple code modifications.
You can betake to two methods of actual adding vectorization into your source code:
To force the compiler to vectorize the loop of your choice, you need to add directives - pragmas - into your source code. The pragmas must be inserted before a loop to convey certain information about this loop to the compiler. For example, you can do it the following way:
Add the simd pragma (with the right clauses) to the loop.
Directly vectorize at loop level by outlining the body of the loop into a vector-elemental function and using the simd pragma.
Strip-mine loop iterations and change each statement in the loop body to operate on the strip.
For more information about the pragmas, refer to the Intel compiler documentation.
Another way to ensure the loop is vectorized is to remove any restrictions that prevent the compiler from automatic vectorization of the target loop. Normally, the compiler reports the loops that were not vectorized and explains why. So the most obvious way to enable automatic vectorization for the target loop is to change the source code so that it contains no restrictions for the automatic vectorization.
To generate a vectorization report, use the Qopt report compiler options, which are OS-specific. After you add the option, you get a report - the list that includes loops that were not vectorized, along with the reason why the compiler did not vectorize them. For more information on hinting the compiler to vectorize, refer to the tutorial in the See Also section of this article.
See Also