Description of OpenMP blocking. More...

Data Types
type	block_list_t

Functions/Subroutines
subroutine	set_omp_thread_blocks (min_thread_blk_size)
	set the grid block size for openmp threads More...

subroutine	write_openmp_block_info

Variables
type(block_list_t), dimension(:), allocatable, target, save	blk
	array of block mapped to multigrid levels More...

integer, save	nthreads = 1
	number of OpenMP threads More...

integer, dimension(3), save, private	min_blk =[32, 3, 3]
	smallest thread block acceptable in during domain partitioning 32 grid points ( 4 cache lines in DP) was chosen for the fastest dimension to acommodate the prefetcher partition of this dimension should happen only for pathological grid sizes 3 grid points for y,z because of the stencil More...

integer, save, private	last_active_lvl

Detailed Description

Description of OpenMP blocking.

Tests done with JTC code around 2014 have shown that blocking a regular grid for Jacobi iteration in y (second) dimension is the most eficient ( for the 6 point stencil). So if the local domain is large enough in y direction, this dimension is split equally among the threads

the blocking for cache efficiency is left to the compiler If ny % ntreads /= 0 the thread block is increased by one. In this case the last thread will have less work or none
think of 9 points split to 4 threads
but for grids that small this imbalance is not important

If the grid is narrow in y we try to split it in yz plane blocking in x direction is not desirable because it interfeers with the prefetcher. Anyway a last resort partitioning in this dimension is provided (later).

Function/Subroutine Documentation

◆ set_omp_thread_blocks()

subroutine dl_mg_omp::set_omp_thread_blocks ( integer, dimension(:), intent(in), optional min_thread_blk_size )

set the grid block size for openmp threads

◆ write_openmp_block_info()

subroutine dl_mg_omp::write_openmp_block_info

Variable Documentation

◆ blk

type(block_list_t), dimension(:), allocatable, target, save dl_mg_omp::blk

array of block mapped to multigrid levels

◆ last_active_lvl

integer, save, private dl_mg_omp::last_active_lvl

private

◆ min_blk

integer, dimension(3), save, private dl_mg_omp::min_blk =[32, 3, 3]

private

smallest thread block acceptable in during domain partitioning 32 grid points ( 4 cache lines in DP) was chosen for the fastest dimension to acommodate the prefetcher partition of this dimension should happen only for pathological grid sizes 3 grid points for y,z because of the stencil

◆ nthreads

integer, save dl_mg_omp::nthreads = 1

number of OpenMP threads

Data Types

Functions/Subroutines

Variables