mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-26 03:16:07 +00:00
[OpenMP][libomp][doc] Add environment variables documentation
Add documentation for the environment variables for libomp Differential Revision: https://reviews.llvm.org/D114269
This commit is contained in:
parent
77ff6f7df8
commit
618f8dc5e5
@ -12,6 +12,633 @@ An `early (2015) design document <https://openmp.llvm.org/Reference.pdf>`_ for
|
||||
the LLVM/OpenMP host runtime, aka. `libomp.so`, is available as a `pdf
|
||||
<https://openmp.llvm.org/Reference.pdf>`_.
|
||||
|
||||
.. _libomp_environment_vars:
|
||||
|
||||
Environment Variables
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
OMP_CANCELLATION
|
||||
""""""""""""""""
|
||||
|
||||
Enables cancellation of the innermost enclosing region of the type specified.
|
||||
If set to ``true``, the effects of the cancel construct and of cancellation
|
||||
points are enabled and cancellation is activated. If set to ``false``,
|
||||
cancellation is disabled and the cancel construct and cancellation points are
|
||||
effectively ignored.
|
||||
|
||||
.. note::
|
||||
Internal barrier code will work differently depending on whether cancellation
|
||||
is enabled. Barrier code should repeatedly check the global flag to figure
|
||||
out if cancellation has been triggered. If a thread observes cancellation, it
|
||||
should leave the barrier prematurely with the return value 1 (and may wake up
|
||||
other threads). Otherwise, it should leave the barrier with the return value 0.
|
||||
|
||||
Enables (``true``) or disables (``false``) cancellation of the innermost
|
||||
enclosing region of the type specified.
|
||||
|
||||
**Default:** ``false``
|
||||
|
||||
|
||||
OMP_DISPLAY_ENV
|
||||
"""""""""""""""
|
||||
|
||||
Enables (``true``) or disables (``false``) the printing to ``stderr`` of
|
||||
the OpenMP version number and the values associated with the OpenMP
|
||||
environment variables.
|
||||
|
||||
Possible values are: ``true``, ``false``, or ``verbose``.
|
||||
|
||||
**Default:** ``false``
|
||||
|
||||
OMP_DEFAULT_DEVICE
|
||||
""""""""""""""""""
|
||||
|
||||
Sets the device that will be used in a target region. The OpenMP routine
|
||||
``omp_set_default_device`` or a device clause in a parallel pragma can override
|
||||
this variable. If no device with the specified device number exists, the code is
|
||||
executed on the host. If this environment variable is not set, device number 0
|
||||
is used.
|
||||
|
||||
OMP_DYNAMIC
|
||||
"""""""""""
|
||||
|
||||
Enables (``true``) or disables (``false``) the dynamic adjustment of the
|
||||
number of threads.
|
||||
|
||||
| **Default:** ``false``
|
||||
|
||||
OMP_MAX_ACTIVE_LEVELS
|
||||
"""""""""""""""""""""
|
||||
|
||||
The maximum number of levels of parallel nesting for the program.
|
||||
|
||||
| **Default:** ``1``
|
||||
|
||||
OMP_NESTED
|
||||
""""""""""
|
||||
|
||||
.. warning::
|
||||
Deprecated. Please use ``OMP_MAX_ACTIVE_LEVELS`` to control nested parallelism
|
||||
|
||||
Enables (``true``) or disables (``false``) nested parallelism.
|
||||
|
||||
| **Default:** ``false``
|
||||
|
||||
OMP_NUM_THREADS
|
||||
"""""""""""""""
|
||||
|
||||
Sets the maximum number of threads to use for OpenMP parallel regions if no
|
||||
other value is specified in the application.
|
||||
|
||||
The value can be a single integer, in which case it specifies the number of threads
|
||||
for all parallel regions. The value can also be a comma-separated list of integers,
|
||||
in which case each integer specifies the number of threads for a parallel
|
||||
region at that particular nesting level.
|
||||
|
||||
The first position in the list represents the outer-most parallel nesting level,
|
||||
the second position represents the next-inner parallel nesting level, and so on.
|
||||
At any level, the integer can be left out of the list. If the first integer in a
|
||||
list is left out, it implies the normal default value for threads is used at the
|
||||
outer-most level. If the integer is left out of any other level, the number of
|
||||
threads for that level is inherited from the previous level.
|
||||
|
||||
| **Default:** The number of processors visible to the operating system on which the program is executed.
|
||||
| **Syntax:** ``OMP_NUM_THREADS=value[,value]*``
|
||||
| **Example:** ``OMP_NUM_THREADS=4,3``
|
||||
|
||||
OMP_PLACES
|
||||
""""""""""
|
||||
|
||||
Specifies an explicit ordered list of places, either as an abstract name
|
||||
describing a set of places or as an explicit list of places described by
|
||||
non-negative numbers. An exclusion operator, ``!``, can also be used to exclude
|
||||
the number or place immediately following the operator.
|
||||
|
||||
For **explicit lists**, an ordered list of places is specified with each place
|
||||
represented as a set of non-negative numbers. The non-negative numbers represent
|
||||
operating system logical processor numbers and can be thought of as an OS affinity mask.
|
||||
|
||||
Individual places can be specified through two methods.
|
||||
Both the **examples** below represent the same place.
|
||||
|
||||
* An explicit list of comma-separated non-negatives numbers **Example:** ``{0,2,4,6}``
|
||||
* An interval with notation ``<lower-bound>:<length>[:<stride>]``. **Example:** ``{0:4:2}``. When ``<stride>`` is omitted, a unit stride is assumed.
|
||||
The interval notation represents this set of numbers:
|
||||
|
||||
::
|
||||
|
||||
<lower-bound>, <lower-bound> + <stride>, ..., <lower-bound> + (<length> - 1) * <stride>
|
||||
|
||||
|
||||
A place list can also be specified using the same interval
|
||||
notation: ``{place}:<length>[:<stride>]``.
|
||||
This represents the list of length ``<length>`` places determined by the following:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
{place}, {place} + <stride>, ..., {place} + (<length>-1)*<stride>
|
||||
Where given {place} and integer N, {place} + N = {place with every number offset by N}
|
||||
Example: {0,3,6}:4:1 represents {0,3,6}, {1,4,7}, {2,5,8}, {3,6,9}
|
||||
|
||||
**Examples of explicit lists:**
|
||||
These all represent the same set of places
|
||||
|
||||
::
|
||||
|
||||
OMP_PLACES="{0,1,2,3},{4,5,6,7},{8,9,10,11},{12,13,14,15}"
|
||||
OMP_PLACES="{0:4},{4:4},{8:4},{12:4}"
|
||||
OMP_PLACES="{0:4}:4:4"
|
||||
|
||||
.. note::
|
||||
When specifying a place using a set of numbers, if any number cannot be
|
||||
mapped to a processor on the target platform, then that number is
|
||||
ignored within the place, but the rest of the place is kept intact.
|
||||
If all numbers within a place are invalid, then the entire place is removed
|
||||
from the place list, but the rest of place list is kept intact.
|
||||
|
||||
The **abstract names** listed below are understood by the run-time environment:
|
||||
|
||||
* ``threads:`` Each place corresponds to a single hardware thread.
|
||||
* ``cores:`` Each place corresponds to a single core (having one or more hardware threads).
|
||||
* ``sockets:`` Each place corresponds to a single socket (consisting of one or more cores).
|
||||
* ``numa_domains:`` Each place corresponds to a single NUMA domain (consisting of one or more cores).
|
||||
* ``ll_caches:`` Each place corresponds to a last-level cache (consisting of one or more cores).
|
||||
|
||||
The abstract name may be appended by a positive number in parentheses to
|
||||
denote the length of the place list to be created, that is ``abstract_name(num-places)``.
|
||||
If the optional number isn't specified, then the runtime will use all available
|
||||
resources of type ``abstract_name``. When requesting fewer places than available
|
||||
on the system, the first available resources as determined by ``abstract_name``
|
||||
are used. When requesting more places than available on the system, only the
|
||||
available resources are used.
|
||||
|
||||
**Examples of abstract names:**
|
||||
::
|
||||
|
||||
OMP_PLACES=threads
|
||||
OMP_PLACES=threads(4)
|
||||
|
||||
OMP_PROC_BIND (Windows, Linux)
|
||||
""""""""""""""""""""""""""""""
|
||||
Sets the thread affinity policy to be used for parallel regions at the
|
||||
corresponding nested level. Enables (``true``) or disables (``false``)
|
||||
the binding of threads to processor contexts. If enabled, this is the
|
||||
same as specifying ``KMP_AFFINITY=scatter``. If disabled, this is the
|
||||
same as specifying ``KMP_AFFINITY=none``.
|
||||
|
||||
**Acceptable values:** ``true``, ``false``, or a comma separated list, each
|
||||
element of which is one of the following values: ``master``, ``close``, ``spread``, or ``primary``.
|
||||
|
||||
**Default:** ``false``
|
||||
|
||||
.. warning::
|
||||
``master`` is deprecated. The semantics of ``master`` are the same as ``primary``.
|
||||
|
||||
If set to ``false``, the execution environment may move OpenMP threads between
|
||||
OpenMP places, thread affinity is disabled, and ``proc_bind`` clauses on
|
||||
parallel constructs are ignored. Otherwise, the execution environment should
|
||||
not move OpenMP threads between OpenMP places, thread affinity is enabled, and
|
||||
the initial thread is bound to the first place in the OpenMP place list.
|
||||
|
||||
If set to ``primary``, all threads are bound to the same place as the primary
|
||||
thread.
|
||||
|
||||
If set to ``close``, threads are bound to successive places, near where the
|
||||
primary thread is bound.
|
||||
|
||||
If set to ``spread``, the primary thread's partition is subdivided and threads
|
||||
are bound to single place successive sub-partitions.
|
||||
|
||||
| **Related environment variables:** ``KMP_AFFINITY`` (overrides ``OMP_PROC_BIND``).
|
||||
|
||||
OMP_SCHEDULE
|
||||
""""""""""""
|
||||
Sets the run-time schedule type and an optional chunk size.
|
||||
|
||||
| **Default:** ``static``, no chunk size specified
|
||||
| **Syntax:** ``OMP_SCHEDULE="kind[,chunk_size]"``
|
||||
|
||||
OMP_STACKSIZE
|
||||
"""""""""""""
|
||||
|
||||
Sets the number of bytes to allocate for each OpenMP thread to use as the
|
||||
private stack for the thread. Recommended size is 16M.
|
||||
|
||||
Use the optional suffixes to specify byte units: ``B`` (bytes), ``K`` (Kilobytes),
|
||||
``M`` (Megabytes), ``G`` (Gigabytes), or ``T`` (Terabytes) to specify the units.
|
||||
If you specify a value without a suffix, the byte unit
|
||||
is assumed to be ``K`` (Kilobytes).
|
||||
|
||||
This variable does not affect the native operating system threads created by the
|
||||
user program, or the thread executing the sequential part of an OpenMP program.
|
||||
|
||||
The ``kmp_{set,get}_stacksize_s()`` routines set/retrieve the value.
|
||||
The ``kmp_set_stacksize_s()`` routine must be called from sequential part, before
|
||||
first parallel region is created. Otherwise, calling ``kmp_set_stacksize_s()``
|
||||
has no effect.
|
||||
|
||||
| **Default:**
|
||||
|
||||
* 32-bit architecture: ``2M``
|
||||
* 64-bit architecture: ``4M``
|
||||
|
||||
| **Related environment variables:** ``KMP_STACKSIZE`` (overrides ``OMP_STACKSIZE``).
|
||||
| **Example:** ``OMP_STACKSIZE=8M``
|
||||
|
||||
OMP_THREAD_LIMIT
|
||||
""""""""""""""""
|
||||
|
||||
Limits the number of simultaneously-executing threads in an OpenMP program.
|
||||
|
||||
If this limit is reached and another native operating system thread encounters
|
||||
OpenMP API calls or constructs, the program can abort with an error message.
|
||||
If this limit is reached when an OpenMP parallel region begins, a one-time
|
||||
warning message might be generated indicating that the number of threads in
|
||||
the team was reduced, but the program will continue.
|
||||
|
||||
The ``omp_get_thread_limit()`` routine returns the value of the limit.
|
||||
|
||||
| **Default:** No enforced limit
|
||||
| **Related environment variable:** ``KMP_ALL_THREADS`` (overrides ``OMP_THREAD_LIMIT``).
|
||||
|
||||
OMP_WAIT_POLICY
|
||||
"""""""""""""""
|
||||
|
||||
Decides whether threads spin (active) or yield (passive) while they are waiting.
|
||||
``OMP_WAIT_POLICY=active`` is an alias for ``KMP_LIBRARY=turnaround``, and
|
||||
``OMP_WAIT_POLICY=passive`` is an alias for ``KMP_LIBRARY=throughput``.
|
||||
|
||||
| **Default:** ``passive``
|
||||
|
||||
.. note::
|
||||
Although the default is ``passive``, unless the user has explicitly set
|
||||
``OMP_WAIT_POLICY``, there is a small period of active spinning determined
|
||||
by ``KMP_BLOCKTIME``.
|
||||
|
||||
KMP_AFFINITY (Windows, Linux)
|
||||
"""""""""""""""""""""""""""""
|
||||
|
||||
Enables run-time library to bind threads to physical processing units.
|
||||
|
||||
You must set this environment variable before the first parallel region, or
|
||||
certain API calls including ``omp_get_max_threads()``, ``omp_get_num_procs()``
|
||||
and any affinity API calls.
|
||||
|
||||
**Syntax:** ``KMP_AFFINITY=[<modifier>,...]<type>[,<permute>][,<offset>]``
|
||||
|
||||
``modifiers`` are optional strings consisting of a keyword and possibly a specifier
|
||||
|
||||
* ``respect`` (default) and ``norespect`` - determine whether to respect the original process affinity mask.
|
||||
* ``verbose`` and ``noverbose`` (default) - determine whether to display affinity information.
|
||||
* ``warnings`` (default) and ``nowarnings`` - determine whether to display warnings during affinity detection.
|
||||
* ``granularity=<specifier>`` - takes the following specifiers ``thread``, ``core`` (default), ``tile``,
|
||||
``socket``, ``die``, ``group`` (Windows only).
|
||||
The granularity describes the lowest topology levels that OpenMP threads are allowed to float within a topology map.
|
||||
For example, if ``granularity=core``, then the OpenMP threads will be allowed to move between logical processors within
|
||||
a single core. If ``granularity=thread``, then the OpenMP threads will be restricted to a single logical processor.
|
||||
* ``proclist=[<proc_list>]`` - The ``proc_list`` is specified by
|
||||
|
||||
+--------------------+----------------------------------------+
|
||||
| Value | Description |
|
||||
+====================+========================================+
|
||||
| <proc_list> := | <proc_id> | { <id_list> } |
|
||||
+--------------------+----------------------------------------+
|
||||
| <id_list> := | <proc_id> | <proc_id>,<id_list> |
|
||||
+--------------------+----------------------------------------+
|
||||
|
||||
Where each ``proc_id`` represents an operating system logical processor ID.
|
||||
For example, ``proclist=[3,0,{1,2},{0,3}]`` with ``OMP_NUM_THREADS=4`` would place thread 0 on
|
||||
OS logical processor 3, thread 1 on OS logical processor 0, thread 2 on both OS logical
|
||||
processors 1 & 2, and thread 3 on OS logical processors 0 & 3.
|
||||
|
||||
``type`` is the thread affinity policy to choose.
|
||||
Valid choices are ``none``, ``balanced``, ``compact``, ``scatter``, ``explicit``, ``disabled``
|
||||
|
||||
* type ``none`` (default) - Does not bind OpenMP threads to particular thread contexts;
|
||||
however, if the operating system supports affinity, the compiler still uses the
|
||||
OpenMP thread affinity interface to determine machine topology.
|
||||
Specify ``KMP_AFFINITY=verbose,none`` to list a machine topology map.
|
||||
* type ``compact`` - Specifying compact assigns the OpenMP thread <n>+1 to a free thread
|
||||
context as close as possible to the thread context where the <n> OpenMP thread was
|
||||
placed. For example, in a topology map, the nearer a node is to the root, the more
|
||||
significance the node has when sorting the threads.
|
||||
* type ``scatter`` - Specifying scatter distributes the threads as evenly as
|
||||
possible across the entire system. ``scatter`` is the opposite of ``compact``; so the
|
||||
leaves of the node are most significant when sorting through the machine topology map.
|
||||
* type ``balanced`` - Places threads on separate cores until all cores have at least one thread,
|
||||
similar to the ``scatter`` type. However, when the runtime must use multiple hardware thread
|
||||
contexts on the same core, the balanced type ensures that the OpenMP thread numbers are close
|
||||
to each other, which scatter does not do. This affinity type is supported on the CPU only for
|
||||
single socket systems.
|
||||
* type ``explicit`` - Specifying explicit assigns OpenMP threads to a list of OS proc IDs that
|
||||
have been explicitly specified by using the ``proclist`` modifier, which is required
|
||||
for this affinity type.
|
||||
* type ``disabled`` - Specifying disabled completely disables the thread affinity interfaces.
|
||||
This forces the OpenMP run-time library to behave as if the affinity interface was not
|
||||
supported by the operating system. This includes the low-level API interfaces such
|
||||
as ``kmp_set_affinity`` and ``kmp_get_affinity``, which have no effect and will return
|
||||
a nonzero error code.
|
||||
|
||||
For both ``compact`` and ``scatter``, ``permute`` and ``offset`` are allowed;
|
||||
however, if you specify only one integer, the runtime interprets the value as
|
||||
a permute specifier. **Both permute and offset default to 0.**
|
||||
|
||||
The ``permute`` specifier controls which levels are most significant when sorting
|
||||
the machine topology map. A value for ``permute`` forces the mappings to make the
|
||||
specified number of most significant levels of the sort the least significant,
|
||||
and it inverts the order of significance. The root node of the tree is not
|
||||
considered a separate level for the sort operations.
|
||||
|
||||
The ``offset`` specifier indicates the starting position for thread assignment.
|
||||
|
||||
| **Default:** ``noverbose,warnings,respect,granularity=core,none``
|
||||
| **Related environment variable:** ``OMP_PROC_BIND`` (``KMP_AFFINITY`` takes precedence)
|
||||
|
||||
.. note::
|
||||
On Windows with multiple processor groups, the norespect affinity modifier
|
||||
is assumed when the process affinity mask equals a single processor group
|
||||
(which is default on Windows). Otherwise, the respect affinity modifier is used.
|
||||
|
||||
.. note::
|
||||
On Windows with multiple processor groups, if the granularity is too coarse, it
|
||||
will be set to ``granularity=group``. For example, if two processor groups exist
|
||||
across one socket, and ``granularity=socket`` the runtime will shift the
|
||||
granularity down to group since that is the largest granularity allowed by the OS.
|
||||
|
||||
KMP_ALL_THREADS
|
||||
"""""""""""""""
|
||||
|
||||
Limits the number of simultaneously-executing threads in an OpenMP program.
|
||||
If this limit is reached and another native operating system thread encounters
|
||||
OpenMP API calls or constructs, then the program may abort with an error
|
||||
message. If this limit is reached at the time an OpenMP parallel region begins,
|
||||
a one-time warning message may be generated indicating that the number of
|
||||
threads in the team was reduced, but the program will continue execution.
|
||||
|
||||
| **Default:** No enforced limit.
|
||||
| **Related environment variable:** ``OMP_THREAD_LIMIT`` (``KMP_ALL_THREADS`` takes precedence)
|
||||
|
||||
KMP_BLOCKTIME
|
||||
"""""""""""""
|
||||
|
||||
Sets the time, in milliseconds, that a thread should wait, after completing
|
||||
the execution of a parallel region, before sleeping.
|
||||
|
||||
Use the optional character suffixes: ``s`` (seconds), ``m`` (minutes),
|
||||
``h`` (hours), or ``d`` (days) to specify the units.
|
||||
|
||||
Specify infinite for an unlimited wait time.
|
||||
|
||||
| **Default:** 200 milliseconds
|
||||
| **Related Environment Variable:** ``KMP_LIBRARY``
|
||||
| **Example:** ``KMP_BLOCKTIME=1s``
|
||||
|
||||
KMP_CPUINFO_FILE
|
||||
""""""""""""""""
|
||||
|
||||
Specifies an alternate file name for a file containing the machine topology
|
||||
description. The file must be in the same format as :file:`/proc/cpuinfo`.
|
||||
|
||||
**Default:** None
|
||||
|
||||
KMP_DETERMINISTIC_REDUCTION
|
||||
"""""""""""""""""""""""""""
|
||||
|
||||
Enables (``true``) or disables (``false``) the use of a specific ordering of
|
||||
the reduction operations for implementing the reduction clause for an OpenMP
|
||||
parallel region. This has the effect that, for a given number of threads, in
|
||||
a given parallel region, for a given data set and reduction operation, a
|
||||
floating point reduction done for an OpenMP reduction clause has a consistent
|
||||
floating point result from run to run, since round-off errors are identical.
|
||||
|
||||
| **Default:** ``false``
|
||||
| **Example:** ``KMP_DETERMINISTIC_REDUCTION=true``
|
||||
|
||||
KMP_DYNAMIC_MODE
|
||||
""""""""""""""""
|
||||
|
||||
Selects the method used to determine the number of threads to use for a parallel
|
||||
region when ``OMP_DYNAMIC=true``. Possible values: (``load_balance`` | ``thread_limit``), where,
|
||||
|
||||
* ``load_balance``: tries to avoid using more threads than available execution units on the machine;
|
||||
* ``thread_limit``: tries to avoid using more threads than total execution units on the machine.
|
||||
|
||||
**Default:** ``load_balance`` (on all supported platforms)
|
||||
|
||||
KMP_HOT_TEAMS_MAX_LEVEL
|
||||
"""""""""""""""""""""""
|
||||
Sets the maximum nested level to which teams of threads will be hot.
|
||||
|
||||
.. note::
|
||||
A hot team is a team of threads optimized for faster reuse by subsequent
|
||||
parallel regions. In a hot team, threads are kept ready for execution of
|
||||
the next parallel region, in contrast to the cold team, which is freed
|
||||
after each parallel region, with its threads going into a common pool
|
||||
of threads.
|
||||
|
||||
For values of 2 and above, nested parallelism should be enabled.
|
||||
|
||||
**Default:** 1
|
||||
|
||||
KMP_HOT_TEAMS_MODE
|
||||
""""""""""""""""""
|
||||
|
||||
Specifies the run-time behavior when the number of threads in a hot team is reduced.
|
||||
Possible values:
|
||||
|
||||
* ``0`` - Extra threads are freed and put into a common pool of threads.
|
||||
* ``1`` - Extra threads are kept in the team in reserve, for faster reuse
|
||||
in subsequent parallel regions.
|
||||
|
||||
**Default:** 0
|
||||
|
||||
KMP_HW_SUBSET
|
||||
"""""""""""""
|
||||
|
||||
Specifies the subset of available hardware resources for the hardware topology
|
||||
hierarchy. The subset is specified in terms of number of units per upper layer
|
||||
unit starting from top layer downwards. E.g. the number of sockets (top layer
|
||||
units), cores per socket, and the threads per core, to use with an OpenMP
|
||||
application, as an alternative to writing complicated explicit affinity settings
|
||||
or a limiting process affinity mask. You can also specify an offset value to set
|
||||
which resources to use.
|
||||
|
||||
An extended syntax is available when ``KMP_TOPOLOGY_METHOD=hwloc``. Depending on what
|
||||
resources are detected, you may be able to specify additional resources, such as
|
||||
NUMA domains and groups of hardware resources that share certain cache levels.
|
||||
|
||||
**Basic syntax:** ``num_unitsID[@offset] [,num_unitsID[@offset]...]``
|
||||
|
||||
Supported unit IDs are not case-insensitive.
|
||||
|
||||
| ``S`` - socket
|
||||
| ``num_units`` specifies the requested number of sockets.
|
||||
|
||||
| ``D`` - die
|
||||
| ``num_units`` specifies the requested number of dies per socket.
|
||||
|
||||
| ``C`` - core
|
||||
| ``num_units`` specifies the requested number of cores per die - if any - otherwise, per socket.
|
||||
|
||||
| ``T`` - thread
|
||||
| ``num_units`` specifies the requested number of HW threads per core.
|
||||
|
||||
``offset`` - (Optional) The number of units to skip.
|
||||
|
||||
.. note::
|
||||
The hardware cache can be specified as a unit, e.g. L2 for L2 cache,
|
||||
or LL for last level cache.
|
||||
|
||||
**Extended syntax when KMP_TOPOLOGY_METHOD=hwloc:**
|
||||
|
||||
Additional IDs can be specified if detected. For example:
|
||||
|
||||
``N`` - numa
|
||||
``num_units`` specifies the requested number of NUMA nodes per upper layer
|
||||
unit, e.g. per socket.
|
||||
|
||||
``TI`` - tile
|
||||
num_units specifies the requested number of tiles to use per upper layer
|
||||
unit, e.g. per NUMA node.
|
||||
|
||||
When any numa or tile units are specified in ``KMP_HW_SUBSET`` and the hwloc
|
||||
topology method is available, the ``KMP_TOPOLOGY_METHOD`` will be automatically
|
||||
set to hwloc, so there is no need to set it explicitly.
|
||||
|
||||
If you don't specify one or more types of resource, such as socket or thread,
|
||||
all available resources of that type are used.
|
||||
|
||||
The run-time library prints a warning, and the setting of
|
||||
``KMP_HW_SUBSET`` is ignored if:
|
||||
|
||||
* a resource is specified, but detection of that resource is not supported
|
||||
by the chosen topology detection method and/or
|
||||
* a resource is specified twice.
|
||||
|
||||
This variable does not work if ``KMP_AFFINITY=disabled``.
|
||||
|
||||
**Default:** If omitted, the default value is to use all the
|
||||
available hardware resources.
|
||||
|
||||
**Examples:**
|
||||
|
||||
* ``2s,4c,2t``: Use the first 2 sockets (s0 and s1), the first 4 cores on each
|
||||
socket (c0 - c3), and 2 threads per core.
|
||||
* ``2s@2,4c@8,2t``: Skip the first 2 sockets (s0 and s1) and use 2 sockets
|
||||
(s2-s3), skip the first 8 cores (c0-c7) and use 4 cores on each socket
|
||||
(c8-c11), and use 2 threads per core.
|
||||
* ``5C@1,3T``: Use all available sockets, skip the first core and use 5 cores,
|
||||
and use 3 threads per core.
|
||||
* ``1T``: Use all cores on all sockets, 1 thread per core.
|
||||
* ``1s, 1d, 1n, 1c, 1t``: Use 1 socket, 1 die, 1 NUMA node, 1 core, 1 thread
|
||||
- use HW thread as a result.
|
||||
* ``1s, 1c, 1t``: Use 1 socket, 1 core, 1 thread. This may result in using
|
||||
single thread on a 3-layer topology architecture, or multiple threads on
|
||||
4-layer or 5-layer architecture. Result may even be different on the same
|
||||
architecture, depending on ``KMP_TOPOLOGY_METHOD`` specified, as hwloc can
|
||||
often detect more topology layers than the default method used by the OpenMP
|
||||
run-time library.
|
||||
|
||||
To see the result of the setting, you can specify ``verbose`` modifier in
|
||||
``KMP_AFFINITY`` environment variable. The OpenMP run-time library will output
|
||||
to ``stderr`` the information about the discovered hardware topology before and
|
||||
after the ``KMP_HW_SUBSET`` setting was applied.
|
||||
|
||||
KMP_INHERIT_FP_CONTROL
|
||||
""""""""""""""""""""""
|
||||
|
||||
Enables (``true``) or disables (``false``) the copying of the floating-point
|
||||
control settings of the primary thread to the floating-point control settings
|
||||
of the OpenMP worker threads at the start of each parallel region.
|
||||
|
||||
**Default:** ``true``
|
||||
|
||||
KMP_LIBRARY
|
||||
"""""""""""
|
||||
|
||||
Selects the OpenMP run-time library execution mode. The values for this variable
|
||||
are ``serial``, ``turnaround``, or ``throughput``.
|
||||
|
||||
| **Default:** ``throughput``
|
||||
| **Related environment variable:** ``KMP_BLOCKTIME`` and ``OMP_WAIT_POLICY``
|
||||
|
||||
KMP_SETTINGS
|
||||
""""""""""""
|
||||
|
||||
Enables (``true``) or disables (``false``) the printing of OpenMP run-time library
|
||||
environment variables during program execution. Two lists of variables are printed:
|
||||
user-defined environment variables settings and effective values of variables used
|
||||
by OpenMP run-time library.
|
||||
|
||||
**Default:** ``false``
|
||||
|
||||
KMP_STACKSIZE
|
||||
"""""""""""""
|
||||
|
||||
Sets the number of bytes to allocate for each OpenMP thread to use as its private stack.
|
||||
|
||||
Recommended size is ``16M``.
|
||||
|
||||
Use the optional suffixes to specify byte units: ``B`` (bytes), ``K`` (Kilobytes),
|
||||
``M`` (Megabytes), ``G`` (Gigabytes), or ``T`` (Terabytes) to specify the units.
|
||||
If you specify a value without a suffix, the byte unit is assumed to be K (Kilobytes).
|
||||
|
||||
**Related environment variable:** ``KMP_STACKSIZE`` overrides ``GOMP_STACKSIZE``, which
|
||||
overrides ``OMP_STACKSIZE``.
|
||||
|
||||
**Default:**
|
||||
|
||||
* 32-bit architectures: ``2M``
|
||||
* 64-bit architectures: ``4M``
|
||||
|
||||
KMP_TOPOLOGY_METHOD
|
||||
"""""""""""""""""""
|
||||
|
||||
Forces OpenMP to use a particular machine topology modeling method.
|
||||
|
||||
Possible values are:
|
||||
|
||||
* ``all`` - Let OpenMP choose which topology method is most appropriate
|
||||
based on the platform and possibly other environment variable settings.
|
||||
* ``cpuid_leaf31`` (x86 only) - Decodes the APIC identifiers as specified by leaf 31 of the
|
||||
cpuid instruction. The runtime will produce an error if the machine does not support leaf 31.
|
||||
* ``cpuid_leaf11`` (x86 only) - Decodes the APIC identifiers as specified by leaf 11 of the
|
||||
cpuid instruction. The runtime will produce an error if the machine does not support leaf 11.
|
||||
* ``cpuid_leaf4`` (x86 only) - Decodes the APIC identifiers as specified in leaf 4
|
||||
of the cpuid instruction. The runtime will produce an error if the machine does not support leaf 4.
|
||||
* ``cpuinfo`` - If ``KMP_CPUINFO_FILE`` is not specified, forces OpenMP to
|
||||
parse :file:`/proc/cpuinfo` to determine the topology (Linux only).
|
||||
If ``KMP_CPUINFO_FILE`` is specified as described above, uses it (Windows or Linux).
|
||||
* ``group`` - Models the machine as a 2-level map, with level 0 specifying the
|
||||
different processors in a group, and level 1 specifying the different
|
||||
groups (Windows 64-bit only).
|
||||
|
||||
.. note::
|
||||
Support for group is now deprecated and will be removed in a future release. Use all instead.
|
||||
|
||||
* ``flat`` - Models the machine as a flat (linear) list of processors.
|
||||
* ``hwloc`` - Models the machine as the Portable Hardware Locality (hwloc) library does.
|
||||
This model is the most detailed and includes, but is not limited to: numa domains,
|
||||
packages, cores, hardware threads, caches, and Windows processor groups. This method is
|
||||
only available if you have configured libomp to use hwloc during CMake configuration.
|
||||
|
||||
**Default:** all
|
||||
|
||||
KMP_VERSION
|
||||
"""""""""""
|
||||
|
||||
Enables (``true``) or disables (``false``) the printing of OpenMP run-time
|
||||
library version information during program execution.
|
||||
|
||||
**Default:** ``false``
|
||||
|
||||
KMP_WARNINGS
|
||||
""""""""""""
|
||||
|
||||
Enables (``true``) or disables (``false``) displaying warnings from the
|
||||
OpenMP run-time library during program execution.
|
||||
|
||||
**Default:** ``true``
|
||||
|
||||
LLVM/OpenMP Target Host Runtime (``libomptarget``)
|
||||
--------------------------------------------------
|
||||
|
Loading…
x
Reference in New Issue
Block a user