Tutorial: Where to Add Parallelism with Intel® Advisor 2015 and a C/C++ Sample

Summary

This tutorial demonstrated an end-to-end workflow you can ultimately apply to your own applications.

Step

Tutorial Recap

Key Tutorial Take-aways

1. Prepare for tutorial.

You chose an Intel Advisor sample application, built it in release mode, tested the resulting target to ensure it runs on your system outside the Intel Advisor, and created and configured a project to hold analysis results for the target.

  • A target is an executable file the Intel Advisor can analyze.

  • Applications compiled and linked in release mode using the following options produce the most accurate and complete Survey and Suitability analysis results:

    • Compiler/Additional include directory: -I${ADVISOR_XE_2015_DIR}/include

    • Compiler/Full debug information: -g

    • Compiler/Moderate optimization: -O2 or higher and -fno-inline-functions

    • Linker/Full debug information: -g

    • Linker/Dynamic loading: -ldl

Step 2: Discover parallel opportunities.

You ran a Survey analysis on the target to highlight hotspots that you subsequently explored.

  • Hotspots are code regions that consume a significant amount of runtime.

  • Loops are often the most time-consuming parts of an application.

  • Use the Advisor Workflow to:

    • Provide a roadmap for finding where to add parallelism.

    • Launch Intel Advisor analysis tools.

    • Provide links to relevant topics in Intel Advisor Help.

  • Use the Survey Report to locate the loops and functions where the target spends the most time.

  • Think of the Summary window as a dashboard to which the Intel Advisor adds more data each time you run Intel Advisor tools.

Step 3: Mark best parallel opportunities with annotations.

You marked the hotspots with parallel site and task annotations, and rebuilt the target in release mode.

  • Annotations are subroutine calls or macros that identify certain information for Intel Advisor analysis tools, such as the location of proposed parallel sites.

  • A parallel site is a region of code that contains one or more time-consuming tasks that may execute in parallel threads to distribute work.

  • Include annotation definitions in your source file(s) like so: #include "advisor-annotate.h".

  • Annotations are fully explained in Intel Advisor Help.

Step 4: Predict maximum parallel performance speedup.

You ran a Suitability analysis to predict the maximum parallel performance speedup based on the added annotations, and posed modeling (what-if) questions.

  • Use the Suitability Report to show the predicted maximum speedup for each parallel site and for the target as a whole.

  • Perform mathematical modeling to see how changing various parameters influences the Maximum Program Gain For All Sites and other values.

Step 5: Predict parallel data sharing problems.

You built the target in debug mode, changed Intel Advisor project properties, and ran a Correctness analysis that discovered parallel data sharing problems based on the added annotations.

  • A data race occurs when multiple tasks read and write data at a shared memory location without coordinating those read and write operations. This can produce parallel execution errors that are difficult to detect and reproduce.

  • Applications compiled and linked in debug mode using the following options produce the most accurate and complete Correctness analysis results:

    • Compiler/Additional include directory: -I${ADVISOR_XE_2015_DIR}/include

    • Compiler/Full debug information: -g

    • Compiler/No optimization: -O0

    • Compiler/Multithreaded, dynamically linked libraries: -Bdynamic

    • Linker/Full debug information: -g

    • Linker/Dynamic loading: -ldl

  • Use the Correctness Report to predict parallel data sharing problems in the annotated target.

  • Reduce the input data set to minimize Correctness tool execution time.

Step 6: Fix data sharing problems.

You fixed the parallel data sharing problems, rebuilt the target in debug mode, and ran another Correctness analysis to ensure you corrected the parallel data sharing problems.

  • Fix parallel data sharing problems only if the predicted maximum speedup benefit outweighs the cost of the fix.

  • Unlike problems reported in serial applications, which often have a single cause, problems in parallel applications usually involve multiple, interrelated code regions.

Step 7: Add parallelism.

You explored how we converted Intel Advisor annotations into three different parallel frameworks for you: Intel® Cilk™ Plus, OpenMP*, and Intel® Threading Building Blocks (Intel® TBB).

  • A parallel framework is a combination of libraries, language features, or other software techniques that enable code to execute in parallel.

  • Add parallelism only if the predicted maximum speedup benefit outweighs the cost of adding parallel framework code.

  • The steps for replacing annotations with parallel framework code are fully explained in Intel Advisor Help.

  • After you convert Intel Advisor annotations to parallel framework code, test the resulting parallel application for correctness and verify its actual performance using the Intel® Inspector and Intel® VTune™ Amplifier respectively.


Submit feedback on this help topic