2018-12-17 17:38:11 +00:00
=====================================
Syntax of AMDGPU Instruction Operands
=====================================
2018-03-12 15:55:08 +00:00
.. contents ::
:local:
Conventions
===========
2018-12-17 17:38:11 +00:00
The following notation is used throughout this document:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
=================== =============================================================================
2018-03-12 15:55:08 +00:00
Notation Description
2018-12-17 17:38:11 +00:00
=================== =============================================================================
2018-03-12 15:55:08 +00:00
{0..N} Any integer value in the range from 0 to N (inclusive).
2022-12-20 14:01:37 +03:00
<x> Syntax and meaning of *x* are explained elsewhere.
2018-12-17 17:38:11 +00:00
=================== =============================================================================
2018-03-12 15:55:08 +00:00
.. _amdgpu_syn_operands:
Operands
========
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_v:
2018-03-12 15:55:08 +00:00
2022-12-21 12:49:48 +03:00
v (32-bit)
----------
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Vector registers. There are 256 32-bit vector registers.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
A sequence of *vector* registers may be used to operate with more than 32 bits of data.
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Assembler currently supports tuples with 1 to 12, 16 and 32 *vector* registers.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
=================================================== ====================================================================
Syntax Description
=================================================== ====================================================================
**v** \<N> A single 32-bit *vector* register.
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
*N* must be a decimal
:ref: `integer number<amdgpu_synid_integer_number>` .
2018-12-17 17:38:11 +00:00
**v[** \ <N> \ **]** A single 32-bit *vector* register.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
*N* may be specified as an
:ref: `integer number<amdgpu_synid_integer_number>`
or an :ref: `absolute expression<amdgpu_synid_absolute_expression>` .
**v[** \ <N> :<K> \ **]** A sequence of (\ *K-N+1* \ ) *vector* registers.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
*N* and *K* may be specified as
:ref: `integer numbers<amdgpu_synid_integer_number>`
or :ref: `absolute expressions<amdgpu_synid_absolute_expression>` .
**[v** \ <N> , \ **v** \ <N+1> , ... **v** \ <K> \ **]** A sequence of (\ *K-N+1* \ ) *vector* registers.
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Register indices must be specified as decimal
:ref: `integer numbers<amdgpu_synid_integer_number>` .
2018-12-17 17:38:11 +00:00
=================================================== ====================================================================
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Note: *N* and *K* must satisfy the following conditions:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
* *N* <= *K* .
* 0 <= *N* <= 255.
* 0 <= *K* <= 255.
2022-12-20 14:01:37 +03:00
* *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
2021-05-14 16:11:36 +03:00
2022-12-20 14:01:37 +03:00
GFX90A and GFX940 have an additional alignment requirement:
pairs of *vector* registers must be even-aligned
2021-05-14 16:11:36 +03:00
(first register must be even).
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Examples:
2018-03-12 15:55:08 +00:00
2018-12-17 18:53:10 +00:00
.. parsed-literal ::
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
v255
v[0]
v[0:1]
v[1:1]
v[0:3]
v[2*2]
v[1-1:2-1]
[v252]
[v252,v253,v254,v255]
2018-03-12 15:55:08 +00:00
2019-07-08 16:50:11 +00:00
.. _amdgpu_synid_nsa:
2022-12-21 12:49:48 +03:00
**Non-Sequential Address (NSA) Syntax**
2022-12-20 14:01:37 +03:00
GFX10+ *image* instructions may use special *NSA* (Non-Sequential Address)
syntax for *image addresses* :
2019-07-08 16:50:11 +00:00
2019-09-25 12:38:35 +00:00
===================================== =================================================
Syntax Description
===================================== =================================================
**[Vm** , \ **Vn** , ... **Vk** \ **]** A sequence of 32-bit *vector* registers.
2022-12-20 14:01:37 +03:00
Each register may be specified using the syntax
2019-09-25 12:38:35 +00:00
defined :ref: `above<amdgpu_synid_v>` .
2019-07-08 16:50:11 +00:00
2022-12-20 14:01:37 +03:00
In contrast with the standard syntax, registers
2019-09-25 12:38:35 +00:00
in *NSA* sequence are not required to have
consecutive indices. Moreover, the same register
2022-12-20 14:01:37 +03:00
may appear in the sequence more than once.
2022-12-21 12:49:48 +03:00
GFX11+ has an additional limitation: if address
size occupies more than 5 dwords, registers
starting from the 5th element must be contiguous.
2019-09-25 12:38:35 +00:00
===================================== =================================================
2019-07-08 16:50:11 +00:00
Examples:
.. parsed-literal ::
2019-09-25 12:38:35 +00:00
[v32,v1,v[2]]
[v[32],v[1:1],[v2]]
2019-07-08 16:50:11 +00:00
[v4,v4,v4,v4]
2022-12-21 12:49:48 +03:00
.. _amdgpu_synid_v16:
v (16-bit)
----------
16-bit vector registers. Each :ref: `32-bit vector register<amdgpu_synid_v>` is divided into two 16-bit low and high registers, so there are 512 16-bit vector registers.
Only VOP3, VOP3P and VINTERP instructions may access all 512 registers (using :ref: `op_sel<amdgpu_synid_op_sel>` modifier).
VOP1, VOP2 and VOPC instructions may currently access only 128 low 16-bit registers using the syntax described below.
.. WARNING :: This section is incomplete. The support of 16-bit registers in the assembler is still WIP.
\
=================================================== ====================================================================
Syntax Description
=================================================== ====================================================================
**v** \<N> A single 16-bit *vector* register (low half).
=================================================== ====================================================================
Note: *N* must satisfy the following conditions:
* 0 <= *N* <= 127.
Examples:
.. parsed-literal ::
v127
2019-12-25 17:51:53 +03:00
.. _amdgpu_synid_a:
a
-
Accumulator registers. There are 256 32-bit accumulator registers.
A sequence of *accumulator* registers may be used to operate with more than 32 bits of data.
2022-12-20 14:01:37 +03:00
Assembler currently supports tuples with 1 to 12, 16 and 32 *accumulator* registers.
2019-12-25 17:51:53 +03:00
=================================================== ========================================================= ====================================================================
2022-12-20 14:01:37 +03:00
Syntax Alternative Syntax (SP3) Description
2019-12-25 17:51:53 +03:00
=================================================== ========================================================= ====================================================================
**a** \<N> **acc** \<N> A single 32-bit *accumulator* register.
*N* must be a decimal
:ref: `integer number<amdgpu_synid_integer_number>` .
**a[** \ <N> \ **]** **acc[** \ <N> \ **]** A single 32-bit *accumulator* register.
*N* may be specified as an
:ref: `integer number<amdgpu_synid_integer_number>`
or an :ref: `absolute expression<amdgpu_synid_absolute_expression>` .
**a[** \ <N> :<K> \ **]** **acc[** \ <N> :<K> \ **]** A sequence of (\ *K-N+1* \ ) *accumulator* registers.
*N* and *K* may be specified as
:ref: `integer numbers<amdgpu_synid_integer_number>`
or :ref: `absolute expressions<amdgpu_synid_absolute_expression>` .
**[a** \ <N> , \ **a** \ <N+1> , ... **a** \ <K> \ **]** **[acc** \ <N> , \ **acc** \ <N+1> , ... **acc** \ <K> \ **]** A sequence of (\ *K-N+1* \ ) *accumulator* registers.
Register indices must be specified as decimal
:ref: `integer numbers<amdgpu_synid_integer_number>` .
=================================================== ========================================================= ====================================================================
Note: *N* and *K* must satisfy the following conditions:
* *N* <= *K* .
* 0 <= *N* <= 255.
* 0 <= *K* <= 255.
2022-12-20 14:01:37 +03:00
* *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
2021-05-14 16:11:36 +03:00
2022-12-20 14:01:37 +03:00
GFX90A and GFX940 have an additional alignment requirement:
pairs of *accumulator* registers must be even-aligned
2021-05-14 16:11:36 +03:00
(first register must be even).
2019-12-25 17:51:53 +03:00
Examples:
.. parsed-literal ::
a255
a[0]
a[0:1]
a[1:1]
a[0:3]
a[2*2]
a[1-1:2-1]
[a252]
[a252,a253,a254,a255]
acc0
acc[1]
[acc250]
[acc2,acc3]
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_s:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
s
-
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Scalar 32-bit registers. The number of available *scalar* registers depends on the GPU:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
======= ============================
GPU Number of *scalar* registers
======= ============================
GFX7 104
GFX8 102
GFX9 102
2022-12-20 14:01:37 +03:00
GFX10+ 106
2018-12-17 17:38:11 +00:00
======= ============================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
A sequence of *scalar* registers may be used to operate with more than 32 bits of data.
2022-12-20 14:01:37 +03:00
Assembler currently supports tuples with 1 to 12, 16 and 32 *scalar* registers.
2018-03-12 15:55:08 +00:00
2021-05-14 16:11:36 +03:00
Pairs of *scalar* registers must be even-aligned (first register must be even).
2018-12-17 17:38:11 +00:00
Sequences of 4 and more *scalar* registers must be quad-aligned.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
======================================================== ====================================================================
Syntax Description
======================================================== ====================================================================
**s** \ <N> A single 32-bit *scalar* register.
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
*N* must be a decimal
:ref: `integer number<amdgpu_synid_integer_number>` .
2018-12-17 17:38:11 +00:00
**s[** \ <N> \ **]** A single 32-bit *scalar* register.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
*N* may be specified as an
:ref: `integer number<amdgpu_synid_integer_number>`
or an :ref: `absolute expression<amdgpu_synid_absolute_expression>` .
**s[** \ <N> :<K> \ **]** A sequence of (\ *K-N+1* \ ) *scalar* registers.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
*N* and *K* may be specified as
:ref: `integer numbers<amdgpu_synid_integer_number>`
or :ref: `absolute expressions<amdgpu_synid_absolute_expression>` .
2019-09-25 12:38:35 +00:00
2018-12-17 17:38:11 +00:00
**[s** \ <N> , \ **s** \ <N+1> , ... **s** \ <K> \ **]** A sequence of (\ *K-N+1* \ ) *scalar* registers.
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Register indices must be specified as decimal
:ref: `integer numbers<amdgpu_synid_integer_number>` .
2018-12-17 17:38:11 +00:00
======================================================== ====================================================================
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Note: *N* and *K* must satisfy the following conditions:
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
* *N* must be properly aligned based on the sequence size.
2018-12-17 17:38:11 +00:00
* *N* <= *K* .
* 0 <= *N* < *SMAX* \ , where *SMAX* is the number of available *scalar* registers.
* 0 <= *K* < *SMAX* \ , where *SMAX* is the number of available *scalar* registers.
2022-12-20 14:01:37 +03:00
* *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Examples:
2018-03-12 15:55:08 +00:00
2018-12-17 18:53:10 +00:00
.. parsed-literal ::
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
s0
s[0]
s[0:1]
s[1:1]
s[0:3]
s[2*2]
s[1-1:2-1]
[s4]
[s4,s5,s6,s7]
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Examples of *scalar* registers with an invalid alignment:
2018-03-12 15:55:08 +00:00
2018-12-17 18:53:10 +00:00
.. parsed-literal ::
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
s[1:2]
s[2:5]
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_trap:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
trap
----
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
A set of trap handler registers:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
* :ref: `ttmp<amdgpu_synid_ttmp>`
* :ref: `tba<amdgpu_synid_tba>`
* :ref: `tma<amdgpu_synid_tma>`
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_ttmp:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
ttmp
----
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Trap handler temporary scalar registers, 32-bits wide.
2022-12-20 14:01:37 +03:00
The number of available *ttmp* registers depends on the GPU:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
======= ===========================
GPU Number of *ttmp* registers
======= ===========================
GFX7 12
GFX8 12
GFX9 16
2022-12-20 14:01:37 +03:00
GFX10+ 16
2018-12-17 17:38:11 +00:00
======= ===========================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
A sequence of *ttmp* registers may be used to operate with more than 32 bits of data.
2022-12-20 14:01:37 +03:00
Assembler currently supports tuples with 1 to 12 and 16 *ttmp* registers.
2018-03-12 15:55:08 +00:00
2021-05-14 16:11:36 +03:00
Pairs of *ttmp* registers must be even-aligned (first register must be even).
2018-12-17 17:38:11 +00:00
Sequences of 4 and more *ttmp* registers must be quad-aligned.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
============================================================= ====================================================================
Syntax Description
============================================================= ====================================================================
**ttmp** \ <N> A single 32-bit *ttmp* register.
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
*N* must be a decimal
:ref: `integer number<amdgpu_synid_integer_number>` .
2018-12-17 17:38:11 +00:00
**ttmp[** \ <N> \ **]** A single 32-bit *ttmp* register.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
*N* may be specified as an
:ref: `integer number<amdgpu_synid_integer_number>`
or an :ref: `absolute expression<amdgpu_synid_absolute_expression>` .
**ttmp[** \ <N> :<K> \ **]** A sequence of (\ *K-N+1* \ ) *ttmp* registers.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
*N* and *K* may be specified as
:ref: `integer numbers<amdgpu_synid_integer_number>`
or :ref: `absolute expressions<amdgpu_synid_absolute_expression>` .
**[ttmp** \ <N> , \ **ttmp** \ <N+1> , ... **ttmp** \ <K> \ **]** A sequence of (\ *K-N+1* \ ) *ttmp* registers.
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Register indices must be specified as decimal
:ref: `integer numbers<amdgpu_synid_integer_number>` .
2018-12-17 17:38:11 +00:00
============================================================= ====================================================================
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Note: *N* and *K* must satisfy the following conditions:
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
* *N* must be properly aligned based on the sequence size.
2018-12-17 17:38:11 +00:00
* *N* <= *K* .
* 0 <= *N* < *TMAX* , where *TMAX* is the number of available *ttmp* registers.
* 0 <= *K* < *TMAX* , where *TMAX* is the number of available *ttmp* registers.
2022-12-20 14:01:37 +03:00
* *K-N+1* must be in the range from 1 to 12 or equal to 16.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Examples:
2018-03-12 15:55:08 +00:00
2018-12-17 18:53:10 +00:00
.. parsed-literal ::
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
ttmp0
ttmp[0]
ttmp[0:1]
ttmp[1:1]
ttmp[0:3]
ttmp[2*2]
ttmp[1-1:2-1]
[ttmp4]
[ttmp4,ttmp5,ttmp6,ttmp7]
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Examples of *ttmp* registers with an invalid alignment:
2018-03-12 15:55:08 +00:00
2018-12-17 18:53:10 +00:00
.. parsed-literal ::
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
ttmp[1:2]
ttmp[2:5]
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_tba:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
tba
---
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Trap base address, 64-bits wide. Holds the pointer to the current
trap handler program.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
================== ======================================================================= =============
Syntax Description Availability
================== ======================================================================= =============
tba 64-bit *trap base address* register. GFX7, GFX8
2019-09-25 12:38:35 +00:00
[tba] 64-bit *trap base address* register (an SP3 syntax). GFX7, GFX8
[tba_lo,tba_hi] 64-bit *trap base address* register (an SP3 syntax). GFX7, GFX8
2018-12-17 17:38:11 +00:00
================== ======================================================================= =============
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
High and low 32 bits of *trap base address* may be accessed as separate registers:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
================== ======================================================================= =============
Syntax Description Availability
================== ======================================================================= =============
tba_lo Low 32 bits of *trap base address* register. GFX7, GFX8
tba_hi High 32 bits of *trap base address* register. GFX7, GFX8
2019-09-25 12:38:35 +00:00
[tba_lo] Low 32 bits of *trap base address* register (an SP3 syntax). GFX7, GFX8
[tba_hi] High 32 bits of *trap base address* register (an SP3 syntax). GFX7, GFX8
2018-12-17 17:38:11 +00:00
================== ======================================================================= =============
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_tma:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
tma
---
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Trap memory address, 64-bits wide.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
================= ======================================================================= ==================
Syntax Description Availability
================= ======================================================================= ==================
tma 64-bit *trap memory address* register. GFX7, GFX8
2019-09-25 12:38:35 +00:00
[tma] 64-bit *trap memory address* register (an SP3 syntax). GFX7, GFX8
[tma_lo,tma_hi] 64-bit *trap memory address* register (an SP3 syntax). GFX7, GFX8
2018-12-17 17:38:11 +00:00
================= ======================================================================= ==================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
High and low 32 bits of *trap memory address* may be accessed as separate registers:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
================= ======================================================================= ==================
Syntax Description Availability
================= ======================================================================= ==================
tma_lo Low 32 bits of *trap memory address* register. GFX7, GFX8
tma_hi High 32 bits of *trap memory address* register. GFX7, GFX8
2019-09-25 12:38:35 +00:00
[tma_lo] Low 32 bits of *trap memory address* register (an SP3 syntax). GFX7, GFX8
[tma_hi] High 32 bits of *trap memory address* register (an SP3 syntax). GFX7, GFX8
2018-12-17 17:38:11 +00:00
================= ======================================================================= ==================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_flat_scratch:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
flat_scratch
------------
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Flat scratch address, 64-bits wide. Holds the base address of scratch memory.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
================================== ================================================================
Syntax Description
================================== ================================================================
flat_scratch 64-bit *flat scratch* address register.
2019-09-25 12:38:35 +00:00
[flat_scratch] 64-bit *flat scratch* address register (an SP3 syntax).
[flat_scratch_lo,flat_scratch_hi] 64-bit *flat scratch* address register (an SP3 syntax).
2018-12-17 17:38:11 +00:00
================================== ================================================================
High and low 32 bits of *flat scratch* address may be accessed as separate registers:
========================= =========================================================================
Syntax Description
========================= =========================================================================
flat_scratch_lo Low 32 bits of *flat scratch* address register.
flat_scratch_hi High 32 bits of *flat scratch* address register.
2019-09-25 12:38:35 +00:00
[flat_scratch_lo] Low 32 bits of *flat scratch* address register (an SP3 syntax).
[flat_scratch_hi] High 32 bits of *flat scratch* address register (an SP3 syntax).
2018-12-17 17:38:11 +00:00
========================= =========================================================================
.. _amdgpu_synid_xnack:
2021-05-14 16:11:36 +03:00
.. _amdgpu_synid_xnack_mask:
2018-12-17 17:38:11 +00:00
2021-05-14 16:11:36 +03:00
xnack_mask
----------
2018-12-17 17:38:11 +00:00
Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads
received an *XNACK* due to a vector memory operation.
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
For availability of *xnack* feature, refer to :ref: `this table<amdgpu-processors>` .
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
============================== =====================================================
Syntax Description
============================== =====================================================
xnack_mask 64-bit *xnack mask* register.
2019-09-25 12:38:35 +00:00
[xnack_mask] 64-bit *xnack mask* register (an SP3 syntax).
[xnack_mask_lo,xnack_mask_hi] 64-bit *xnack mask* register (an SP3 syntax).
2018-12-17 17:38:11 +00:00
============================== =====================================================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
High and low 32 bits of *xnack mask* may be accessed as separate registers:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
===================== ==============================================================
Syntax Description
===================== ==============================================================
xnack_mask_lo Low 32 bits of *xnack mask* register.
xnack_mask_hi High 32 bits of *xnack mask* register.
2019-09-25 12:38:35 +00:00
[xnack_mask_lo] Low 32 bits of *xnack mask* register (an SP3 syntax).
[xnack_mask_hi] High 32 bits of *xnack mask* register (an SP3 syntax).
2018-12-17 17:38:11 +00:00
===================== ==============================================================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_vcc:
2019-07-08 16:50:11 +00:00
.. _amdgpu_synid_vcc_lo:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
vcc
---
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Vector condition code, 64-bits wide. A bit mask with one bit per thread;
it holds the result of a vector compare operation.
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Note that GFX10+ H/W does not use high 32 bits of *vcc* in *wave32* mode.
2019-07-08 16:50:11 +00:00
2018-12-17 17:38:11 +00:00
================ =========================================================================
Syntax Description
================ =========================================================================
vcc 64-bit *vector condition code* register.
2019-09-25 12:38:35 +00:00
[vcc] 64-bit *vector condition code* register (an SP3 syntax).
[vcc_lo,vcc_hi] 64-bit *vector condition code* register (an SP3 syntax).
2018-12-17 17:38:11 +00:00
================ =========================================================================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
High and low 32 bits of *vector condition code* may be accessed as separate registers:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
================ =========================================================================
Syntax Description
================ =========================================================================
vcc_lo Low 32 bits of *vector condition code* register.
vcc_hi High 32 bits of *vector condition code* register.
2019-09-25 12:38:35 +00:00
[vcc_lo] Low 32 bits of *vector condition code* register (an SP3 syntax).
[vcc_hi] High 32 bits of *vector condition code* register (an SP3 syntax).
2018-12-17 17:38:11 +00:00
================ =========================================================================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_m0:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
m0
--
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
A 32-bit memory register. It has various uses,
including register indexing and bounds checking.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
=========== ===================================================
Syntax Description
=========== ===================================================
m0 A 32-bit *memory* register.
2019-09-25 12:38:35 +00:00
[m0] A 32-bit *memory* register (an SP3 syntax).
2018-12-17 17:38:11 +00:00
=========== ===================================================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_exec:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
exec
----
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Execute mask, 64-bits wide. A bit mask with one bit per thread,
which is applied to vector instructions and controls which threads execute
and which ignore the instruction.
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Note that GFX10+ H/W does not use high 32 bits of *exec* in *wave32* mode.
2019-07-08 16:50:11 +00:00
2018-12-17 17:38:11 +00:00
===================== =================================================================
Syntax Description
===================== =================================================================
exec 64-bit *execute mask* register.
2019-09-25 12:38:35 +00:00
[exec] 64-bit *execute mask* register (an SP3 syntax).
[exec_lo,exec_hi] 64-bit *execute mask* register (an SP3 syntax).
2018-12-17 17:38:11 +00:00
===================== =================================================================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
High and low 32 bits of *execute mask* may be accessed as separate registers:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
===================== =================================================================
Syntax Description
===================== =================================================================
exec_lo Low 32 bits of *execute mask* register.
exec_hi High 32 bits of *execute mask* register.
2019-09-25 12:38:35 +00:00
[exec_lo] Low 32 bits of *execute mask* register (an SP3 syntax).
[exec_hi] High 32 bits of *execute mask* register (an SP3 syntax).
2018-12-17 17:38:11 +00:00
===================== =================================================================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_vccz:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
vccz
----
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
A single bit flag indicating that the :ref: `vcc<amdgpu_synid_vcc>`
is all zeros.
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Note: when GFX10+ operates in *wave32* mode, this register reflects
the state of :ref: `vcc_lo<amdgpu_synid_vcc_lo>` .
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_execz:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
execz
-----
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
A single bit flag indicating that the :ref: `exec<amdgpu_synid_exec>`
is all zeros.
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Note: when GFX10+ operates in *wave32* mode, this register reflects
the state of :ref: `exec_lo<amdgpu_synid_exec>` .
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_scc:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
scc
---
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
A single bit flag indicating the result of a scalar compare operation.
2018-03-12 15:55:08 +00:00
2019-07-08 16:50:11 +00:00
.. _amdgpu_synid_lds_direct:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
lds_direct
----------
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
A special operand which supplies a 32-bit value
fetched from *LDS* memory using :ref: `m0<amdgpu_synid_m0>` as an address.
2018-03-12 15:55:08 +00:00
2019-07-08 16:50:11 +00:00
.. _amdgpu_synid_null:
null
----
2022-12-20 14:01:37 +03:00
This is a special operand that may be used as a source or a destination.
2019-07-08 16:50:11 +00:00
When used as a destination, the result of the operation is discarded.
When used as a source, it supplies zero value.
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_constant:
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
inline constant
---------------
2022-12-20 14:01:37 +03:00
An *inline constant* is an integer or a floating-point value
encoded as a part of an instruction. Compare *inline constants*
with :ref: `literals<amdgpu_synid_literal>` .
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Inline constants include:
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
* :ref: `Integer inline constants<amdgpu_synid_iconst>` ;
* :ref: `Floating-point inline constants<amdgpu_synid_fconst>` ;
* :ref: `Inline values<amdgpu_synid_ival>` .
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
If a number may be encoded as either
2019-09-25 12:38:35 +00:00
a :ref: `literal<amdgpu_synid_literal>` or
2019-07-08 16:50:11 +00:00
a :ref: `constant<amdgpu_synid_constant>` ,
2022-12-20 14:01:37 +03:00
the assembler selects the latter encoding as more efficient.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_iconst:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
iconst
2019-07-08 16:50:11 +00:00
~~~~~~
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
An :ref: `integer number<amdgpu_synid_integer_number>` or
an :ref: `absolute expression<amdgpu_synid_absolute_expression>`
2018-12-17 17:38:11 +00:00
encoded as an *inline constant* .
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Only a small fraction of integer numbers may be encoded as *inline constants* .
They are enumerated in the table below.
2022-12-20 14:01:37 +03:00
Other integer numbers are encoded as :ref: `literals<amdgpu_synid_literal>` .
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
================================== ====================================
Value Note
================================== ====================================
{0..64} Positive integer inline constants.
{-16..-1} Negative integer inline constants.
================================== ====================================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_fconst:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
fconst
2019-07-08 16:50:11 +00:00
~~~~~~
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
A :ref: `floating-point number<amdgpu_synid_floating-point_number>`
encoded as an *inline constant* .
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Only a small fraction of floating-point numbers may be encoded
as *inline constants* . They are enumerated in the table below.
Other floating-point numbers are encoded as
:ref: `literals<amdgpu_synid_literal>` .
2018-03-12 15:55:08 +00:00
2019-01-18 15:17:17 +00:00
===================== ===================================================== ==================
Value Note Availability
===================== ===================================================== ==================
0.0 The same as integer constant 0. All GPUs
0.5 Floating-point constant 0.5 All GPUs
1.0 Floating-point constant 1.0 All GPUs
2.0 Floating-point constant 2.0 All GPUs
4.0 Floating-point constant 4.0 All GPUs
-0.5 Floating-point constant -0.5 All GPUs
-1.0 Floating-point constant -1.0 All GPUs
-2.0 Floating-point constant -2.0 All GPUs
-4.0 Floating-point constant -4.0 All GPUs
2022-12-20 14:01:37 +03:00
0.1592 1.0/(2.0*pi). Use only for 16-bit operands. GFX8+
0.15915494 1.0/(2.0*pi). Use only for 16- and 32-bit operands. GFX8+
0.15915494309189532 1.0/(2.0*pi). GFX8+
2019-01-18 15:17:17 +00:00
===================== ===================================================== ==================
2018-03-12 15:55:08 +00:00
2020-08-21 14:22:25 +03:00
.. WARNING :: Floating-point inline constants cannot be used with *16-bit integer* operands. \
2022-12-20 14:01:37 +03:00
Assembler encodes these values as literals.
2018-03-12 15:55:08 +00:00
2019-07-08 16:50:11 +00:00
.. _amdgpu_synid_ival:
ival
~~~~
A symbolic operand encoded as an *inline constant* .
These operands provide read-only access to H/W registers.
2022-12-20 14:01:37 +03:00
===================== ========================= ================================================ =============
Syntax Alternative Syntax (SP3) Note Availability
===================== ========================= ================================================ =============
shared_base src_shared_base Base address of shared memory region. GFX9+
shared_limit src_shared_limit Address of the end of shared memory region. GFX9+
private_base src_private_base Base address of private memory region. GFX9+
private_limit src_private_limit Address of the end of private memory region. GFX9+
pops_exiting_wave_id src_pops_exiting_wave_id A dedicated counter for POPS. GFX9, GFX10
===================== ========================= ================================================ =============
2019-07-08 16:50:11 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_literal:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
literal
-------
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
A *literal* is a 64-bit value encoded as a separate
32-bit dword in the instruction stream. Compare *literals*
with :ref: `inline constants<amdgpu_synid_constant>` .
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
If a number may be encoded as either
2019-09-25 12:38:35 +00:00
a :ref: `literal<amdgpu_synid_literal>` or
2018-12-17 17:38:11 +00:00
an :ref: `inline constant<amdgpu_synid_constant>` ,
assembler selects the latter encoding as more efficient.
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Literals may be specified as
:ref: `integer numbers<amdgpu_synid_integer_number>` ,
2019-09-25 12:38:35 +00:00
:ref: `floating-point numbers<amdgpu_synid_floating-point_number>` ,
:ref: `absolute expressions<amdgpu_synid_absolute_expression>` or
:ref: `relocatable expressions<amdgpu_synid_relocatable_expression>` .
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
An instruction may use only one literal,
but several operands may refer to the same literal.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_uimm8:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
uimm8
-----
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
An 8-bit :ref: `integer number<amdgpu_synid_integer_number>`
2019-09-25 12:38:35 +00:00
or an :ref: `absolute expression<amdgpu_synid_absolute_expression>` .
The value must be in the range 0..0xFF.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_uimm32:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
uimm32
------
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
A 32-bit :ref: `integer number<amdgpu_synid_integer_number>`
or an :ref: `absolute expression<amdgpu_synid_absolute_expression>` .
The value must be in the range 0..0xFFFFFFFF.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_uimm20:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
uimm20
------
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
A 20-bit :ref: `integer number<amdgpu_synid_integer_number>`
or an :ref: `absolute expression<amdgpu_synid_absolute_expression>` .
The value must be in the range 0..0xFFFFF.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_simm21:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
simm21
------
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
A 21-bit :ref: `integer number<amdgpu_synid_integer_number>`
or an :ref: `absolute expression<amdgpu_synid_absolute_expression>` .
The value must be in the range -0x100000..0x0FFFFF.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_off:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
off
---
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
A special entity which indicates that the value of this operand is not used.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
================================== ===================================================
Syntax Description
================================== ===================================================
off Indicates an unused operand.
================================== ===================================================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_number:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Numbers
=======
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_integer_number:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Integer Numbers
---------------
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Integer numbers are 64 bits wide.
2019-09-25 12:38:35 +00:00
They are converted to :ref: `expected operand type<amdgpu_syn_instruction_type>`
as described :ref: `here<amdgpu_synid_int_conv>` .
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Integer numbers may be specified in binary, octal,
hexadecimal and decimal formats:
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
============ =============================== ========
Format Syntax Example
============ =============================== ========
Decimal [-]?[1-9][0-9]* -1234
Binary [-]?0b[01]+ 0b1010
Octal [-]?0[0-7]+ 010
Hexadecimal [-]?0x[0-9a-fA-F]+ 0xff
\ [-]?[0x]?[0-9][0-9a-fA-F]*[hH] 0ffh
============ =============================== ========
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_floating-point_number:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Floating-Point Numbers
----------------------
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
All floating-point numbers are handled as double (64 bits wide).
2019-09-25 12:38:35 +00:00
They are converted to
:ref: `expected operand type<amdgpu_syn_instruction_type>`
as described :ref: `here<amdgpu_synid_fp_conv>` .
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Floating-point numbers may be specified in hexadecimal and decimal formats:
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
============ ======================================================== ====================== ====================
Format Syntax Examples Note
============ ======================================================== ====================== ====================
Decimal [-]?[0-9]*[.][0-9]* ([eE][+-]?[0-9]*)? -1.234, 234e2 Must include either
a decimal separator
or an exponent.
Hexadecimal [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]* )?[pP][+-]?[0-9a-fA-F]+ -0x1afp-10, 0x.1afp10
============ ======================================================== ====================== ====================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_expression:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Expressions
===========
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
An expression is evaluated to a 64-bit integer.
Note that floating-point expressions are not supported.
2018-12-17 17:38:11 +00:00
There are two kinds of expressions:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
* :ref: `Absolute<amdgpu_synid_absolute_expression>` .
* :ref: `Relocatable<amdgpu_synid_relocatable_expression>` .
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_absolute_expression:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Absolute Expressions
--------------------
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
The value of an absolute expression does not change after program relocation.
2018-12-17 17:38:11 +00:00
Absolute expressions must not include unassigned and relocatable values
such as labels.
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Absolute expressions are evaluated to 64-bit integer values and converted to
:ref: `expected operand type<amdgpu_syn_instruction_type>`
as described :ref: `here<amdgpu_synid_int_conv>` .
2018-12-17 17:38:11 +00:00
Examples:
2018-03-12 15:55:08 +00:00
2018-12-17 18:53:10 +00:00
.. parsed-literal ::
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
x = -1
y = x + 10
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_relocatable_expression:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Relocatable Expressions
-----------------------
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
The value of a relocatable expression depends on program relocation.
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Note that use of relocatable expressions is limited to branch targets
2019-09-25 12:38:35 +00:00
and 32-bit integer operands.
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
A relocatable expression is evaluated to a 64-bit integer value,
which depends on operand kind and
:ref: `relocation type<amdgpu-relocation-records>` of symbol(s)
used in the expression. For example, if an instruction refers to a label,
this reference is evaluated to an offset from the address after
the instruction to the label address:
2018-03-12 15:55:08 +00:00
2018-12-17 18:53:10 +00:00
.. parsed-literal ::
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
label:
v_add_co_u32_e32 v0, vcc, label, v1 // 'label' operand is evaluated to -4
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Note that values of relocatable expressions are usually unknown
at assembly time; they are resolved later by a linker and converted to
2019-09-25 12:38:35 +00:00
:ref: `expected operand type<amdgpu_syn_instruction_type>`
as described :ref: `here<amdgpu_synid_rl_conv>` .
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Operands and Operations
-----------------------
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Expressions are composed of 64-bit integer operands and operations.
Operands include :ref: `integer numbers<amdgpu_synid_integer_number>`
and :ref: `symbols<amdgpu_synid_symbol>` .
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Expressions may also use "." which is a reference
to the current PC (program counter).
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
:ref: `Unary<amdgpu_synid_expression_un_op>` and
:ref: `binary<amdgpu_synid_expression_bin_op>`
2019-09-25 12:38:35 +00:00
operations produce 64-bit integer results.
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Syntax of Expressions
---------------------
2018-03-12 15:55:08 +00:00
2020-08-21 14:22:25 +03:00
Syntax of expressions is shown below::
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
expr ::= expr binop expr | primaryexpr ;
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
primaryexpr ::= '(' expr ')' | symbol | number | '.' | unop primaryexpr ;
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
binop ::= '&&'
| '||'
| '|'
| '^'
| '&'
| '!'
| '=='
| '!='
| '<>'
| '<'
| '<='
| '>'
| '>='
| '<<'
| '>>'
| '+'
| '-'
| '*'
| '/'
| '%' ;
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
unop ::= '~'
| '+'
| '-'
| '!' ;
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_expression_bin_op:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Binary Operators
----------------
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Binary operators are described in the following table.
They operate on and produce 64-bit integers.
Operators with higher priority are performed first.
========== ========= ===============================================
Operator Priority Meaning
========== ========= ===============================================
\* 5 Integer multiplication.
/ 5 Integer division.
% 5 Integer signed remainder.
\+ 4 Integer addition.
\- 4 Integer subtraction.
<< 3 Integer shift left.
>> 3 Logical shift right.
== 2 Equality comparison.
!= 2 Inequality comparison.
<> 2 Inequality comparison.
< 2 Signed less than comparison.
<= 2 Signed less than or equal comparison.
> 2 Signed greater than comparison.
>= 2 Signed greater than or equal comparison.
\| 1 Bitwise or.
^ 1 Bitwise xor.
& 1 Bitwise and.
&& 0 Logical and.
|| 0 Logical or.
========== ========= ===============================================
.. _amdgpu_synid_expression_un_op:
Unary Operators
---------------
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Unary operators are described in the following table.
They operate on and produce 64-bit integers.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
========== ===============================================
Operator Meaning
========== ===============================================
! Logical negation.
~ Bitwise negation.
\+ Integer unary plus.
\- Integer unary minus.
========== ===============================================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_symbol:
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Symbols
-------
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
A symbol is a named 64-bit integer value, representing a relocatable
2018-12-17 17:38:11 +00:00
address or an absolute (non-relocatable) number.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
Symbol names have the following syntax:
`` [a-zA-Z_.][a-zA-Z0-9_$.@]* ``
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
The table below provides several examples of syntax used for symbol definition.
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
================ ==========================================================
Syntax Meaning
================ ==========================================================
.globl <S> Declares a global symbol S without assigning it a value.
.set <S>, <E> Assigns the value of an expression E to a symbol S.
<S> = <E> Assigns the value of an expression E to a symbol S.
<S>: Declares a label S and assigns it the current PC value.
================ ==========================================================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
A symbol may be used before it is declared or assigned;
unassigned symbols are assumed to be PC-relative.
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Additional information about symbols may be found :ref: `here<amdgpu-symbols>` .
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
.. _amdgpu_synid_conv:
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Type and Size Conversion
========================
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
This section describes what happens when a 64-bit
:ref: `integer number<amdgpu_synid_integer_number>` , a
2019-09-25 12:38:35 +00:00
:ref: `floating-point number<amdgpu_synid_floating-point_number>` or an
:ref: `expression<amdgpu_synid_expression>`
2018-12-17 17:38:11 +00:00
is used for an operand which has a different type or size.
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
.. _amdgpu_synid_int_conv:
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Conversion of Integer Values
----------------------------
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Instruction operands may be specified as 64-bit
:ref: `integer numbers<amdgpu_synid_integer_number>` or
:ref: `absolute expressions<amdgpu_synid_absolute_expression>` .
These values are converted to the
:ref: `expected operand type<amdgpu_syn_instruction_type>`
using the following steps:
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
1. *Validation* . Assembler checks if the input value may be truncated
without loss to the required *truncation width* (see the table below).
There are two cases when this operation is enabled:
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
* The truncated bits are all 0.
* The truncated bits are all 1 and the value after truncation has its MSB bit set.
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
In all other cases, the assembler triggers an error.
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
2. *Conversion* . The input value is converted to the expected type
as described in the table below. Depending on operand kind, this conversion
is performed by either assembler or AMDGPU H/W (or both).
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
============== ================= =============== ====================================================================
Expected type Truncation Width Conversion Description
============== ================= =============== ====================================================================
i16, u16, b16 16 num.u16 Truncate to 16 bits.
i32, u32, b32 32 num.u32 Truncate to 32 bits.
i64 32 {-1,num.i32} Truncate to 32 bits and then sign-extend the result to 64 bits.
u64, b64 32 {0,num.u32} Truncate to 32 bits and then zero-extend the result to 64 bits.
f16 16 num.u16 Use low 16 bits as an f16 value.
f32 32 num.u32 Use low 32 bits as an f32 value.
f64 32 {num.u32,0} Use low 32 bits of the number as high 32 bits
of the result; low 32 bits of the result are zeroed.
============== ================= =============== ====================================================================
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Examples of enabled conversions:
2018-03-12 15:55:08 +00:00
2018-12-17 18:53:10 +00:00
.. parsed-literal ::
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
// GFX9
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
v_add_u16 v0, -1, 0 // src0 = 0xFFFF
v_add_f16 v0, -1, 0 // src0 = 0xFFFF (NaN)
//
v_add_u32 v0, -1, 0 // src0 = 0xFFFFFFFF
v_add_f32 v0, -1, 0 // src0 = 0xFFFFFFFF (NaN)
//
v_add_u16 v0, 0xff00, v0 // src0 = 0xff00
v_add_u16 v0, 0xffffffffffffff00, v0 // src0 = 0xff00
v_add_u16 v0, -256, v0 // src0 = 0xff00
//
s_bfe_i64 s[0:1], 0xffefffff, s3 // src0 = 0xffffffffffefffff
s_bfe_u64 s[0:1], 0xffefffff, s3 // src0 = 0x00000000ffefffff
v_ceil_f64_e32 v[0:1], 0xffefffff // src0 = 0xffefffff00000000 (-1.7976922776554302e308)
//
x = 0xffefffff //
s_bfe_i64 s[0:1], x, s3 // src0 = 0xffffffffffefffff
s_bfe_u64 s[0:1], x, s3 // src0 = 0x00000000ffefffff
v_ceil_f64_e32 v[0:1], x // src0 = 0xffefffff00000000 (-1.7976922776554302e308)
Examples of disabled conversions:
2018-03-12 15:55:08 +00:00
2018-12-17 18:53:10 +00:00
.. parsed-literal ::
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
// GFX9
2018-03-12 15:55:08 +00:00
2018-12-28 11:48:23 +00:00
v_add_u16 v0, 0x1ff00, v0 // truncated bits are not all 0 or 1
v_add_u16 v0, 0xffffffffffff00ff, v0 // truncated bits do not match MSB of the result
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
.. _amdgpu_synid_fp_conv:
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Conversion of Floating-Point Values
-----------------------------------
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
Instruction operands may be specified as 64-bit
:ref: `floating-point numbers<amdgpu_synid_floating-point_number>` .
These values are converted to the
:ref: `expected operand type<amdgpu_syn_instruction_type>`
using the following steps:
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
1. *Validation* . Assembler checks if the input f64 number can be converted
2022-12-20 14:01:37 +03:00
to the *required floating-point type* (see the table below) without overflow
or underflow. Precision lost is allowed. If this conversion is not possible,
the assembler triggers an error.
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
2. *Conversion* . The input value is converted to the expected type
as described in the table below. Depending on operand kind, this is
performed by either assembler or AMDGPU H/W (or both).
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
============== ================ ================= =================================================================
Expected type Required FP Type Conversion Description
============== ================ ================= =================================================================
i16, u16, b16 f16 f16(num) Convert to f16 and use bits of the result as an integer value.
2022-12-20 14:01:37 +03:00
The value has to be encoded as a literal, or an error occurs.
2020-08-21 14:22:25 +03:00
Note that the value cannot be encoded as an inline constant.
2019-09-25 12:38:35 +00:00
i32, u32, b32 f32 f32(num) Convert to f32 and use bits of the result as an integer value.
i64, u64, b64 \- \- Conversion disabled.
f16 f16 f16(num) Convert to f16.
f32 f32 f32(num) Convert to f32.
f64 f64 {num.u32.hi,0} Use high 32 bits of the number as high 32 bits of the result;
zero-fill low 32 bits of the result.
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Note that the result may differ from the original number.
============== ================ ================= =================================================================
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Examples of enabled conversions:
2018-03-12 15:55:08 +00:00
2018-12-17 18:53:10 +00:00
.. parsed-literal ::
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
// GFX9
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
v_add_f16 v0, 1.0, 0 // src0 = 0x3C00 (1.0)
v_add_u16 v0, 1.0, 0 // src0 = 0x3C00
//
v_add_f32 v0, 1.0, 0 // src0 = 0x3F800000 (1.0)
v_add_u32 v0, 1.0, 0 // src0 = 0x3F800000
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
// src0 before conversion:
// 1.7976931348623157e308 = 0x7fefffffffffffff
// src0 after conversion:
// 1.7976922776554302e308 = 0x7fefffff00000000
2018-12-28 11:48:23 +00:00
v_ceil_f64 v[0:1], 1.7976931348623157e308
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
v_add_f16 v1, 65500.0, v2 // ok for f16.
v_add_f32 v1, 65600.0, v2 // ok for f32, but would result in overflow for f16.
Examples of disabled conversions:
2018-03-12 15:55:08 +00:00
2018-12-17 18:53:10 +00:00
.. parsed-literal ::
2018-03-12 15:55:08 +00:00
2018-12-17 17:38:11 +00:00
// GFX9
2018-03-12 15:55:08 +00:00
2018-12-28 11:48:23 +00:00
v_add_f16 v1, 65600.0, v2 // overflow
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
.. _amdgpu_synid_rl_conv:
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
Conversion of Relocatable Values
--------------------------------
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
:ref: `Relocatable expressions<amdgpu_synid_relocatable_expression>`
may be used with 32-bit integer operands and jump targets.
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
When the value of a relocatable expression is resolved by a linker, it is
converted as needed and truncated to the operand size. The conversion depends
on :ref: `relocation type<amdgpu-relocation-records>` and operand kind.
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
For example, when a 32-bit operand of an instruction refers
to a relocatable expression *expr* , this reference is evaluated
to a 64-bit offset from the address after the
2019-09-25 12:38:35 +00:00
instruction to the address being referenced, *counted in bytes* .
Then the value is truncated to 32 bits and encoded as a literal:
2018-03-12 15:55:08 +00:00
2018-12-17 18:53:10 +00:00
.. parsed-literal ::
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
expr = .
v_add_co_u32_e32 v0, vcc, expr, v1 // 'expr' operand is evaluated to -4
// and then truncated to 0xFFFFFFFC
2018-03-12 15:55:08 +00:00
2022-12-20 14:01:37 +03:00
As another example, when a branch instruction refers to a label,
2019-09-25 12:38:35 +00:00
this reference is evaluated to an offset from the address after the
instruction to the label address, *counted in dwords* .
Then the value is truncated to 16 bits:
.. parsed-literal ::
2018-03-12 15:55:08 +00:00
2019-09-25 12:38:35 +00:00
label:
s_branch label // 'label' operand is evaluated to -1 and truncated to 0xFFFF