495 Commits

Author SHA1 Message Date
Wenju He
552902455c
[libclc] Add ctz built-in implementation to clc and generic (#135309) 2025-04-15 15:23:25 +01:00
Wenju He
cbda72a547
[NFC][libclc] Merge atomic extension built-ins with identical name into a single file (#134489)
llvm-diff shows there is no change to amdgcn--amdhsa.bc.

Similar to how cl_khr_fp64 and cl_khr_fp16 implementations are put in a
same file for math built-ins, this PR do the same to atom_* built-ins.

The main motivation is to prevent that two files with same base name
implementats different built-ins. In a follow-up PR, I'd like to relax
libclc_configure_lib_source to only compare filename instead of path for
overriding, since in our downstream the same category of built-ins, e.g.
math, are organized in several different folders.
2025-04-14 10:27:48 +01:00
Fraser Cormack
b0338c3d6c
[libclc] Move shuffle/shuffle2 to the CLC library (#135000)
This commit moves the shuffle and shuffle2 builtins to the CLC library.
In so doing it makes the headers simpler and re-usable for other builtin
layers to hook into the CLC functions, if they wish.

An additional gentype utility has been made available, which provides a
consistent vector-size-or-1 macro for use.

The existing __CLC_VECSIZE is defined but empty which is useful in
certain applications, such as in concatenation with a type to make a
correctly sized scalar or vector type. However, this isn't usable in the
same preprocessor lines when wanting to check for specific vector sizes,
as e.g., '__CLC_VECSIZE == 2' resolves to '== 2' which is invalid. In
local testing this is also useful for the geometric builtins which are
only available for scalar types and vector types of 2, 3, or 4 elements.

No codegen changes are observed, except the internal shuffle/shuffle2
utility functions are no longer made publicly available.
2025-04-09 15:52:25 +01:00
Fraser Cormack
949bf518fc [libclc][NFC] Fix up inconsistent copyright headers
Some files were accidentally given two copyright headers. Another was
missing one. This commit also converts that file's dos line endings to
unix ones and reformats a comment.
2025-04-09 12:00:08 +01:00
Fraser Cormack
ddc48fefe3
[libclc] Move native_(exp10|powr|tan) to CLC library (#134080)
These are the three remaining native builtins not yet ported.

There are elementwise versions of exp10 and tan which correspond to the
intrinsics, which may be preferable to the current versions which route
through other native builtins. Those could be changed in a follow-up if
desired.
2025-04-02 17:37:17 +01:00
Fraser Cormack
f186041553
[libclc] Move sinh, cosh & tanh to the CLC library (#134063)
This commit also vectorizes the builtins.
2025-04-02 15:22:42 +01:00
Fraser Cormack
d51525ba36
[libclc] Move lgamma, lgamma_r & tgamma to CLC library (#134053)
Also enable half-precision variants of tgamma, which were previously
missing.

Note that unlike recent work, these builtins are not vectorized as part
of this commit. Ultimately all three call into lgamma_r, which has heavy
control flow (including switch statements) that would be difficult to
vectorize. Additionally the lgamma_r algorithm is copyrighted to SunPro
so may need a rewrite in the future anyway.

There are no codegen changes (to non-SPIR-V targets) with this commit,
aside from the new half builtins.
2025-04-02 15:20:32 +01:00
Fraser Cormack
dd19e7eaaa
[libclc] Move cbrt to the CLC library; vectorize (#133940) 2025-04-02 10:18:24 +01:00
Fraser Cormack
f14ff59da7
[libclc] Move exp, exp2 and expm1 to the CLC library (#133932)
These all share the use of a common helper function so are handled in
one go. These builtins are also now vectorized.
2025-04-01 18:15:37 +01:00
Fraser Cormack
00e6d4fe06 [libclc][NFC] Delete three unused .inc files 2025-04-01 17:36:01 +01:00
Fraser Cormack
c1efd8b663 [libclc][NFC] Delete two unused headers
These should have been deleted when the respective builtins were moved
to the CLC library.
2025-04-01 14:54:50 +01:00
Fraser Cormack
bcf0f8d8aa
[libclc] Move exp10 to the CLC library (#133899)
The builtin was already nominally in the CLC library; this commit just
moves it over. It also vectorizes the builtin on its way.
2025-04-01 14:39:17 +01:00
Fraser Cormack
13a313fe58
[libclc] Move sinpi/cospi/tanpi to the CLC library (#133889)
Additionally, these builtins are now vectorized.

This also moves the native_recip and native_divide builtins as they are
used by the tanpi builtin.
2025-04-01 12:03:21 +01:00
Fraser Cormack
ad48fffb53
[libclc] Move several 'native' builtins to CLC library (#129679)
This commit moves the 'native' builtins that use asm statements to
generate LLVM intrinsics to the CLC library. In doing so it converts
them to use the appropriate elementwise builtin to generate the same
intrinsic; there are no codegen changes to any target except to AMDGPU
targets where `native_log` is no longer custom implemented and instead
used the clang elementwise builtin.

This work forms part of #127196 and indeed with this commit there are no
'generic' builtins using/abusing asm statements - the remaining builtins
are specific to the amdgpu and r600 targets.
2025-04-01 09:20:54 +01:00
Fraser Cormack
7a2b160e76
[libclc] Move rootn to the CLC library; optimize (#133735)
The function was already nominally in the CLC namespace; this commit
just moves it over.

This commit also vectorizes the builtin to avoid scalarization.
2025-04-01 09:19:50 +01:00
Fraser Cormack
b52977b868
[libclc] Move pow, powr & pown to the CLC library (#133294)
These functions were already nominally in the CLC library.

Similar to others, these builtins are now vectorized and are not broken
down into scalar types.
2025-03-28 08:23:24 +00:00
Fraser Cormack
d32e71d7c7
[libclc] Move fmod, remainder & remquo to the CLC library (#132054)
These functions were already nominally in the CLC namespace; this commit
just formally moves them over.

Note that 'half' versions of these CLC functions are now provided.
Previously the corresponding OpenCL builtins would forward directly to
the 'float' versions of the CLC builtins. Now the OpenCL builtins call
the 'half' CLC builtins, which themselves call the 'float' CLC versions.
This keeps the interface between the OpenCL and CLC libraries neater and
keeps the CLC library self-contained.

No changes to the generated code for non-SPIR-V targets is observed.
2025-03-27 14:53:19 +00:00
Fraser Cormack
3284559cca
[libclc] Move atan2/atan2pi to the CLC library (#133226)
As with other work in this area, these builtins are now vectorized.

A further table has been split into two. There was discrepancy between
comments above the table describing the values as "lead" and "tail" and
variables taken from the table called "head" and "tail", so these have
been unified as head/tail.
2025-03-27 10:59:09 +00:00
Fraser Cormack
db98e2922f
[libclc] Move log1p/asinh/acosh/atanh to the CLC library (#132956)
These four functions all related in that they share tables and helper
functions. Furthermore, the acosh and atanh builtins call log1p.

As with other work in this area, these builtins are now vectorized. To
enable this, there are new table accessor functions which return a
vector of table values using a vector of indices. These are internally
scalarized, in the absence of gather operations. Some tables which were
tables of multiple entries (e.g., double2) are split into two separate
"low" and "high" tables. This might affect the performance of memory
operations but are hopefully mitigated by better codegen overall.
2025-03-27 09:19:07 +00:00
Fraser Cormack
3013458a79
[libclc] Move asinpi/acospi/atanpi to the CLC library (#132918)
Similar to d46a6999, this commit simultaneously moves these three
functions to the CLC library and optimizes them for vector types by
avoiding scalarization.
2025-03-25 13:31:53 +00:00
Fraser Cormack
d46a699953
[libclc] Move asin/acos/atan to the CLC library (#132788)
This commit simultaneously moves these three functions to the CLC
library and optimizing them for vector types by avoiding scalarization.
2025-03-25 09:11:32 +00:00
Fraser Cormack
7e22b09031
[libclc] Add missing license headers to source IR files (#132758) 2025-03-24 16:21:59 +00:00
Fraser Cormack
70c325bf6a
[libclc] Move fp32 sincos helpers to CLC library (#132753)
This commit moves most of the sincos helper functions to the CLC
library. It simultaneously vectorizes them with the aim to increase
performance for vector types by avoiding scalarization.

Some helpers for double types remain as they use various features not
yet ready, like 'fract' which in turn relies on 'fmin'; neither of these
are in the CLC library. They also use table lookups and type punning
which don't translate well to vector versions.

As a proof of concept, float and half versions of the sin and cos
builtins are now vectorized and use the CLC helpers to do so. They
remain in the OpenCL layer but will be simpler to move to the CLC
library when the double versions are ready.
2025-03-24 16:09:31 +00:00
Romaric Jodin
a6a56a326a
[libclc] erfc: fix fp32 implementation in FTZ mode (#132390)
On some implementations, the current implementation leads to slight
accuracy issues.

While the maths behind this implementation is correct, it does not take
into account the accumulation of errors coming from other operators that
do not provide correct rounding (like the exp function).

To avoid it, compute statically exp(-0.5625).

Fixes #124939
2025-03-24 16:08:54 +00:00
Fraser Cormack
63b5692bac
[libclc] Relicense gen_convert.py (#132213)
Similar to work done in 82912fd6, this commit re-licenses both the
gen_convert.py script and the file it generates.

It previously possessed an MIT license, with three additional individual
copyrights. The file it generated was similar, but to only two of the
three individuals. LLVM's policy is not to accept contributions that
include in-source copyright notices [1]. I'm not aware whether the
individuals concerned signed the re-licensing agreement or not.

It takes the opportunity to update the description(s) in the header
files, since the previous comments were out of date.

[1]
https://llvm.org/docs/DeveloperPolicy.html#embedded-copyright-or-contributed-by-statements
2025-03-24 11:10:07 +00:00
Fraser Cormack
7d048674a4
[libclc] Add license headers to files missing them (#132239)
This commit bulk updates all '.h', '.cl', '.inc', and '.cpp' files to
add any missing license headers.

The remaining files are generally CMake, SOURCES, scripts, markdown,
etc.

There are still some '.ll' files which may benefit from a license
header. I can't find an example of an LLVM IR file with a license header
in the rest of LLVM, but unlike most other (sub)projects, libclc has
examples of LLVM IR as source files, compiled and built into the
library.
2025-03-24 10:10:38 +00:00
Fraser Cormack
82912fd620
[libclc] Update license headers (#132070)
This commit bulk-updates the libclc license headers to the current
Apache-2.0 WITH LLVM-exception license in situations where they were
previously attributed to AMD - and occasionally under an additional
single individual contributor - under an MIT license.

AMD signed the LLVM relicensing agreement and so agreed for their past
contributions under the new LLVM license.

The LLVM project also has had a long-standing, unwritten, policy of not
adding copyright notices to source code. This policy was recently
written up [1]. This commit therefore also removes these copyright
notices at the same time.

Note that there are outstanding copyright notices attributed to others -
and many files missing copyright headers - which will be dealt with in
future work.

[1]
https://llvm.org/docs/DeveloperPolicy.html#embedded-copyright-or-contributed-by-statements
2025-03-20 11:40:09 +00:00
Fraser Cormack
e5d5503e4e
[libclc] Move hypot to CLC library; optimize (#129551)
This was already nominally in the CLC library; this commit just formally
moves it over. It simultaneously optimizes it for vector types by
avoiding scalarization.
2025-03-04 14:16:16 +00:00
Fraser Cormack
1357279df9
[libclc] Move rsqrt to the CLC library (#129045)
This also adds missing half variants to certain targets.

It also optimizes some targets' implementations to perform the operation
directly in vector types, as opposed to scalarizing.
2025-02-27 15:46:58 +00:00
Fraser Cormack
285b411e46
[libclc] Move sqrt to CLC library (#128748)
This is fairly straightforward for most targets.

We use the element-wise sqrt builtin by default. We also remove a legacy
pre-filtering of the input argument, which the intrinsic now officially
handles.

AMDGPU provides its own implementation of sqrt for double types. This
commit moves this into the implementation of CLC sqrt. It uses weak
linkage on the 'default' CLC sqrt to allow AMDGPU to only override the
builtin for the types it cares about.
2025-02-27 12:30:24 +00:00
Fraser Cormack
d5038b3774
[libclc] Move __clc_ldexp to CLC library (#126078)
This function was already conceptually in the CLC namespace - this just
formally moves it over.

Note however that this commit marks a change in how libclc functions may
be overridden by targets.

Until now we have been using a purely build-system-based approach where
targets could register identically-named files which took responsibility
for the implementation of the builtin in its entirety.

This system wasn't well equipped to deal with AMD's overriding of
__clc_ldexp for only a subset of types, and furthermore conditionally on
a pre-defined macro.

One option for handling this would be to require AMD to duplicate code
for the versions of __clc_ldexp it's *not* interested in overriding. We
could also make it easier for targets to re-define CLC functions through
macros or .inc files. Both of these have obvious downsides. We could
also keep AMD's overriding in the OpenCL layer and bypass CLC
altogether, but this has limited use.

We could use weak linkage on the "base" implementations of CLC
functions, and allow targets to opt-in to providing their own
implementations on a much finer granularity. This commit supports this
as a proof of concept; we could expand it to all CLC builtins if
accepted.

Note that the existing filename-based "claiming" approach is still in
effect, so targets have to name their overrides differently to have both
files compiled. This could also be refined.
2025-02-26 11:20:25 +00:00
Fraser Cormack
a821ae2847
[libclc] Move round to CLC library (#128721) 2025-02-25 16:24:57 +00:00
Fraser Cormack
1e0e4169dd
[libclc][NFC] Remove unused intrinsics helpers (#128708)
We want to move away from using asm declarations to define builtins.
2025-02-25 14:29:35 +00:00
Fraser Cormack
f8948d3c47
[libclc] Move log/log2/log10 to CLC library (#128540)
This commit also enables fp16 log, which was previously missing.

Other than that, no changes to codegen for AMDGPU/Nvidia targets.

Note that for simplicity this commit doesn't try to refactor or optimize
the implementations. Notably, each log is only implementated for scalar
types; vector types are scalarized. It doesn't look too difficult to
make the implementations suitable for vector codegen, so I'll try that
in a future commit.

There's also an unused implementation of log in clc_log_base.h, whereas
the implementation currently used by libclc targets re-uses log2 with an
additional multiplication. That should also be cleaned up as on first
inspection it looks a more optimal implementation, though it would have
to be checked against the OpenCL CTS for good measure.
2025-02-25 11:44:59 +00:00
Fraser Cormack
2dfb29a9b2
[libclc] Move nan to the CLC library (#128521) 2025-02-24 15:41:31 +00:00
Fraser Cormack
aca8a5cb23
[libclc] Remove clspv-specific clc conversions (#128500)
The clc and clc+clspv modes produced the same conversions code, so this
patch simplifies the process. It further simplifies the internal checks
the script makes by assuming the mutual exclusivity.
2025-02-24 12:31:55 +00:00
Fraser Cormack
e7ad07ffb8
[libclc] Move fma to the CLC library (#126052)
This builtin is a little more involved than others as targets deal with
fma in various different ways.

Fundamentally, the CLC __clc_fma builtin compiles to
__builtin_elementwise_fma, which compiles to the @llvm.fma intrinsic.
However, in the case of fp32 fma some targets call the __clc_sw_fma
function, which provides a software implementation of the builtin. This
in principle is controlled by the __CLC_HAVE_HW_FMA32 macro and may be a
runtime decision, depending on how the target defines that macro.

All targets build the CLC fma functions for all types. This is to the
CLC library can have a reliable internal implementation for its own
purposes.

For AMD/NVPTX targets there are no meaningful changes to the generated
LLVM bytecode. Some blocks of code have moved around, which confounds
llvm-diff.

For the clspv and SPIR-V/Mesa targets, only fp32 fma is of interest. Its
use in libclc is tightly controlled by checking __CLC_HAVE_HW_FMA32
first. This can either be a compile-time constant (1, for clspv) or a
runtime function for SPIR-V/Mesa.

The SPIR-V/Mesa target only provided fp32 fma in the OpenCL layer. It
unconditionally mapped that to the __clc_sw_fma builtin, even though the
generic version in theory had a runtime toggle through
__CLC_HAVE_HW_FMA32 specifically for that target. Callers of fma,
though, would end up using the ExtInst fma, *not* calling the _Z3fmafff
function provided by libclc.

This commit keeps this system in place in the OpenCL layer, by mapping
fma to __clc_sw_fma. Where other builtins would previously call fma
(i.e., result in the ExtInst), they now call __clc_fma. This function
checks the __CLC_HAVE_HW_FMA32 runtime toggle, which selects between the
slow version or the quick version. The quick version is the LLVM fma
intrinsic which llvm-spirv translates to the ExtInst.

The clspv target had its own software implementation of fp32 fma, which
it called unconditionally - even though __CLC_HAVE_HW_FMA32 is 1 for
that target. This is potentially just so its library ships a software
version which it can fall back on. In the OpenCL layer, the target
doesn't provide fp64 fma, and maps fp16 fma to fp32 mad.

This commit keeps this system roughly in place: in the OpenCL layer it
maps fp32 fma to __clc_sw_fma, and fp16 fma to mad. Where builtins would
previously call into fma, they now call __clc_fma, which compiles to the
LLVM intrinsic. If this goes through a translation to SPIR-V it will
become the fma ExtInst, or the intrinsic could be replaced by the
_Z3fmafff software implementation.

The clspv and SPIR-V/Mesa targets could potentially be cleaned up later,
depending on their needs.
2025-02-24 10:10:51 +00:00
Fraser Cormack
6c2d418027
[libclc] Fix int<->float conversion builtins (#126905)
While working on moving the conversion builtins to the CLC library in
25c05541 it was discovered that many weren't passing the OpenCL-CTS
tests.

As it happens, the clspv-specific code for conversion implementations
between integer and floating-point types was more correct. However:

* The clspv code was generating 'sat' conversions to floating-point
types, which are not legal
* The clspv code around rtn/rtz conversions needed tweaking as it wasn't
validating when sizeof(dst) > sizeof(src), e.g., int -> double.

With this commit, the CTS failures seen before have been resolved.

This also assumes that the new implementations are correct also for
clspv. If this is the case, then 'clc' and 'clspv' modes are mutually
exclusive and we can simplify the build process for conversions by not
building clc-clspv-convert.cl.
2025-02-24 09:28:51 +00:00
Fraser Cormack
ae5785460d
[libclc] Define macros for users of gentype.inc (#128012)
Several users of (mostly math/) gentype.inc rely on types other than the
'gentype'. This is commonly intN as several maths builtins expose this
as a return or paramter type. We were previously explicitly defining
this type for every gentype.

Other implementations rely on integer types of the same size and element
width as the gentype, such as short/ushort for half, long/ulong for
double, etc.

Users might also rely on as_type or convert_type builtins to/from these
types.

The previous method we used to define intN was unscalable if we wanted
to expose more types and helpers.

This commit introduces a simpler system whereby several macros are
defined at the beginning of gentype.inc. These rely on concatenating
with the vector size. To facilitate this system, scalar gentypes now
define an empty vector size. It was previously undefined, which was
dangerous. An added benefit is that it matches how the integer
gentype.inc vector size has been working.

These macros will be especially helpful for the definitions of
logb/ilogb in an upcoming patch.
2025-02-20 15:24:04 +00:00
Fraser Cormack
684ad25dfc
[libclc] Move frexp to CLC library; optimize half vecs (#127836)
This commit moves the frexp builtin to the CLC library.

It simultaneously optimizes the code generated for half vectors, which
was previously scalarizing and casting up to float. With this commit it
still casts up to float, but keeps it in the vector form.
2025-02-20 08:41:45 +00:00
Fraser Cormack
079115e6ea
[libclc] Move modf to the CLC library (#127828)
The "generic" unary_(def|decl)_with_ptr files are intended to be re-used
by the sincos and fract builtins in the future as they share an
identical type signature.
2025-02-20 08:36:46 +00:00
Fraser Cormack
9743b99cd1
[libclc] Explicitly qualify private address spaces (#127823)
Doing so provides stability when compiling the builtins in a mode in
which unqualified pointers may be interpreted as being in the generic
address space, such as in OpenCL 3.0.

We eventually want to provide 'generic' overloads of the builtins in
libclc so this prepares the ground a little better.

It could be argued that having the internal CLC helper functions be
unqualified is more flexible, in case it's better for a target to have
the pointers in the generic address space. This commits to the private
address space for more stability across different OpenCL environments.
2025-02-19 16:26:24 +00:00
Fraser Cormack
fb5a87e1a6 [libclc][NFC] Reformat ep_log.cl 2025-02-19 15:18:05 +00:00
Fraser Cormack
df12bad075
[libclc] Use CLC conversion builtins in CLC functions (#127628)
This commit is a broad update across libclc to use the CLC conversion
builtins in CLC functions, even those with a '__clc' prefix in the
generic folder. This better prepares them for an official move to the
CLC library in time.

The CLC conversion builtins have an additional benefit in that they
support scalars, unlike the __builtin_convertvector builtin which we
were using previously. This allows us to simplify some shared
definitions.

There is one change to the IR, in the scalar upsample(char, uchar)
builtin. It now sign-extends the first argument to i16, where before it
zero-extended it. This appears to be correct, and matches the vector
behaviour.
2025-02-18 14:52:41 +00:00
Fraser Cormack
25c0554166
[libclc] Move conversion builtins to the CLC library (#124727)
This commit moves the implementations of conversion builtins to the CLC
library. It keeps the dichotomy of regular vs. clspv implementations of
the conversions. However, for the sake of a consistent interface all CLC
conversion routines are built, even the ones that clspv opts out of in
the user-facing OpenCL layer.

It simultaneously updates the python script to use f-strings for
formatting.
2025-02-12 08:55:02 +00:00
Fraser Cormack
64735ad639
[libclc] Move sign to the CLC builtins library (#115699)
This commit moves the sign builtin's implementation to the CLC library.
It simultaneously optimizes it (for vector types) by removing
control-flow from the implementation.

The __CLC_INTERNAL preprocessor definition has been repurposed (without
the leading underscores) to be passed when building the internal CLC
library. It was only used in one other place to guard an extra maths
preprocessor definition, which we can do unconditionally.
2025-02-11 11:14:49 +00:00
Fraser Cormack
d4144ca27d [libclc][NFC] Clang-format two files
Pre-commit changes to avoid noise in an upcoming PR.
2025-02-06 09:04:27 +00:00
Fraser Cormack
76d1cb22c1
[libclc] Move rotate to CLC library; optimize (#125713)
This commit moves the rotate builtin to the CLC library.

It also optimizes rotate(x, n) to generate the @llvm.fshl(x, x, n)
intrinsic, for both scalar and vector types. The previous implementation
was too cautious in its handling of the shift amount; the OpenCL rules
state that the shift amount is always treated as an unsigned value
modulo the bitwidth.
2025-02-05 10:38:23 +00:00
Fraser Cormack
fe694b18dc
[libclc] Move mad_sat to CLC; optimize for vector types (#125517)
This commit moves the mad_sat builtin to the CLC library.

It also optimizes it for vector types by avoiding scalarization. To help
do this it transforms the previous control-flow code into vector select
code. This has also been done for the scalar versions for simplicity.
2025-02-03 17:50:42 +00:00
Fraser Cormack
7441e87fe0
[libclc] Move several integer functions to CLC library (#116786)
This commit moves over the OpenCL clz, hadd, mad24, mad_hi, mul24,
mul_hi, popcount, rhadd, and upsample builtins to the CLC library.

This commit also optimizes the vector forms of the mul_hi and upsample
builtins to consistently remain in vector types, instead of recursively
splitting vectors down to the scalar form.

The OpenCL mad_hi builtin wasn't previously publicly available from the
CLC libraries, as it was hash-defined to mul_hi in the header files.
That issue has been fixed, and mad_hi is now exposed.

The custom AMD implementation/workaround for popcount has been removed
as it was only required for clang < 7.

There are still two integer functions which haven't been moved over. The
OpenCL mad_sat builtin uses many of the other integer builtins, and
would benefit from optimization for vector types. That can take place in
a follow-up commit. The rotate builtin could similarly use some more
dedicated focus, potentially using clang builtins.
2025-01-29 13:45:33 +00:00