|
type(block_list_t), dimension(:), allocatable, target, save | blk |
| array of block mapped to multigrid levels More...
|
|
integer, save | nthreads = 1 |
| number of OpenMP threads More...
|
|
integer, dimension(3), save, private | min_blk =[32, 3, 3] |
| smallest thread block acceptable in during domain partitioning 32 grid points ( 4 cache lines in DP) was chosen for the fastest dimension to acommodate the prefetcher partition of this dimension should happen only for pathological grid sizes 3 grid points for y,z because of the stencil More...
|
|
integer, save, private | last_active_lvl |
|
Description of OpenMP blocking.
Tests done with JTC code around 2014 have shown that blocking a regular grid for Jacobi iteration in y (second) dimension is the most eficient ( for the 6 point stencil). So if the local domain is large enough in y direction, this dimension is split equally among the threads
- the blocking for cache efficiency is left to the compiler If ny % ntreads /= 0 the thread block is increased by one. In this case the last thread will have less work or none
- think of 9 points split to 4 threads
- but for grids that small this imbalance is not important
If the grid is narrow in y we try to split it in yz plane blocking in x direction is not desirable because it interfeers with the prefetcher. Anyway a last resort partitioning in this dimension is provided (later).