Tutorial: Where to Add Parallelism with Intel® Advisor 2015 and a C/C++ Sample
Intel Advisor annotations are either subroutine calls or macros, depending on the programming language. Annotations can be processed by your current compiler but do not change the computations of your application.
Use them to mark places in serial parts of your application that are good candidates for later replacement with parallel framework code that enables parallel execution.
The main types of Intel Advisor annotations mark the location of:
A parallel site. A parallel site is a region of code that contains one or more tasks that may execute in parallel. An effective parallel site typically contains a hotspot that consumes application execution time. To distribute these frequently executed instructions to different tasks that can run at the same time, the best parallel site is not usually located at the hotspot, but higher in the call tree.
One or more parallel tasks within a parallel site. A task is a portion of time-consuming code with data that can be executed in one or more parallel threads to distribute work.
Locking synchronization, where mutual exclusion of data access must occur in the parallel application.
Intel Advisor provides example annotated source code for you (accessible in the Survey Report and Survey Source windows) that you can copy directly into your editor:
Annotation Code Snippet |
Purpose |
---|---|
Iteration Loop, Single Task |
Create a simple loop structure, where the task code includes the entire loop body. This common task structure is useful when only a single task is needed within a parallel site. |
Loop, One or More Tasks |
Create loops where the task code does not include all of the loop body, or complex loops or code that requires specific task begin-end boundaries, including multiple task end annotations. This structure is also useful when multiple tasks are needed within a parallel site. |
Function, One or More Tasks |
Create code that calls multiple tasks within a parallel site. |
Pause/Resume Collection |
Temporarily pause data collection and later resume it, so you can skip uninteresting parts of target execution to minimize collected data and speed up analysis of large applications. Add these annotations outside a parallel site. |
Build Settings |
Set build (compiler and linker) settings specific to the language in use. |
Annotations are fully explained in Intel Advisor Help.
When adding annotations to your own application, remember to include the annotations definitions, such as advisor-annotate.h for C/C++ programs.
In your own application, choosing where to add task annotations may require some experimentation. If your parallel site has nested loops and the computation time used by the innermost loop is small, consider adding task annotations around the next outermost loop.
Because we are trying to keep this tutorial short, we already added parallel site and task annotations to the sample code for you. All you need to do is uncomment them.
Click Survey Report on the navigation toolbar to re-open the Survey Report.
Right-click the data row with the first hot loop and choose Edit Source to open the nqueens_serial.cpp source file in an editor.
// [DESCRIPTION] // Solve the nqueens problem - how many positions of queens can fit on a chess // board of a given size without attacking each other. // // [RUN] // To set the board size in Visual Studio, right click on the project, // select Properies > Configuration Properties > General > Debugging. Set // Command Arguments to the desired value. 14 has been set as the default. // // [EXPECTED OUTPUT] // Depends upon the board size. // // Board Size Number of Solutions // 4 2 // 5 10 // 6 4 // 7 40 // 8 92 // 9 352 // 10 724 // 11 2680 // 12 14200 // 13 73712 // 14 365596 // 15 2279184 #include <iostream> #include <cstdlib> #ifdef _WIN32 #include <windows.h> #include <mmsystem.h> #define TimeType DWORD #define GET_TIME(t) t = timeGetTime() #define TIME_IN_MS(t) (t) #else #include <sys/time.h> #define TimeType struct timeval #define GET_TIME(t) gettimeofday((&t), NULL) #define TIME_IN_MS(t) (((t).tv_sec * 1000000 + (t).tv_usec) / 1000) #endif #include <cilk/cilk.h> #include <cilk/reducer_opadd.h> //ADVISOR COMMENT: This is a Cilk version of the nqueens application //ADVISOR SUITABILITY EDIT: Uncomment the #include <advisor-annotate.h> line to // use Advisor annotations. //#include <advisor-annotate.h> using namespace std; cilk::reducer_opadd<int> nrOfSolutions; // Counts the number of solutions. int size = 0; // The board-size; read from command-line // The number of correct solutions for each board size. const int correctSolution[16] = { 0, 1, 0, 0, // 0 - 3 2, 10, 4, 40, // 4 - 7 92, 352, 724, 2680, // 8 - 11 14200, 73712, 365596, 2279184 // 12 - 15 }; /* * Recursive function to find all solutions on a board, represented by the * argument "queens", when we place the next queen at location (row, col). * * On Return: nrOfSolutions has been increased by the number of solutions for * this board. */ void setQueen(int queens[], int row, int col) { //ADVISOR COMMENT: The accesses to the "queens" array in this function // create an incidental sharing correctness issue. //ADVISOR COMMENT: Each task should have its own copy of the queens array. //ADVISOR COMMENT: Look at the solve() function to see how to fix this. // Check all previously placed rows for attacks. for (int i=0; i < row; i++) { // Check vertical attacks. if (queens[i] == col) { return; } // Check diagonal attacks. if (abs(queens[i] - col) == (row - i) ) { return; } } // Column is ok, set the queen. //ADVISOR COMMENT: See comment at top of function. queens[row]=col; if (row == (size - 1)) { //ADVISOR CORRECTNESS EDIT: Uncomment the following two LOCK // annotations to lock the access to nrOfSolutions and // eliminate the race condition. //ANNOTATE_LOCK_ACQUIRE(0); //ADVISOR COMMENT: This is a race condition because multiple tasks may // try and increment nrOfSolutions at the same time. nrOfSolutions++; // Placed final queen, found a solution! //ANNOTATE_LOCK_RELEASE(0); } else { // Try to fill next row. for (int i=0; i < size; i++) { setQueen(queens, row+1, i); } } } /* * Find all solutions for nQueens problem on size x size chessboard. * * On Return: nrOfSolutions = number of solutions for size x size chessboard. */ void solve() { //ADVISOR COMMENT: When surveying, this is the top function below main. // This for() loop is a candidate for parallelization. //ADVISOR CORRECTNESS EDIT: Comment out the following declaration of the // queens array. //int *queens = new int[size]; // Array of queens on the board. //ADVISOR SUITABILITY EDIT: Uncomment the three annotations below to model // parallelizing the body of this for() loop. //ANNOTATE_SITE_BEGIN(solve); cilk_for (int i=0; i < size; i++) { //ANNOTATE_ITERATION_TASK(setQueen); //ADVISOR CORRECTNESS EDIT: Uncomment the declaration of queens. This // creates a separate array for each recursion // eliminating the incidental sharing. int * queens = new int[size]; // Array of queens on the chess board. //ADVISOR COMMENT: The call below exhibits incidental sharing when all // of the tasks use the same copy of "queens". // Try all positions in first row. setQueen(queens, 0, i); //ADVISOR CORRECTNESS EDIT: Uncomment the deletion of the queens array. delete [] queens; } //ANNOTATE_SITE_END(); //ADVISOR CORRECTNESS EDIT: Comment out the deletion of the queens array. //delete [] queens; }
Search for ADVISOR SUITABILITY EDIT and follow the directions in the sample code. Make four total edits: Uncomment the #include line near the top and three annotation lines.
Now is also a good time to simply explore our fully commented sample code.
Save your edits.
In the terminal session:
Change directory to the nqueens_Advisor/ directory (where the zipped sample files were extracted to).
Type make 1_nqueens_serial to rebuild the target.