49 Commits

Author SHA1 Message Date
Fraser Cormack
82912fd620
[libclc] Update license headers (#132070)
This commit bulk-updates the libclc license headers to the current
Apache-2.0 WITH LLVM-exception license in situations where they were
previously attributed to AMD - and occasionally under an additional
single individual contributor - under an MIT license.

AMD signed the LLVM relicensing agreement and so agreed for their past
contributions under the new LLVM license.

The LLVM project also has had a long-standing, unwritten, policy of not
adding copyright notices to source code. This policy was recently
written up [1]. This commit therefore also removes these copyright
notices at the same time.

Note that there are outstanding copyright notices attributed to others -
and many files missing copyright headers - which will be dealt with in
future work.

[1]
https://llvm.org/docs/DeveloperPolicy.html#embedded-copyright-or-contributed-by-statements
2025-03-20 11:40:09 +00:00
Fraser Cormack
760eeac6a2
[libclc] Reduce bithacking in CLC frexp (#129871)
Also replace some magic constants with named ones.

Checking against FP zero and using isnan and isinf functions allows the
optimizer to create one unified @llvm.is.fpclass intrinsic. This results
in fewer more canonical IR instructions.
2025-03-05 14:18:51 +00:00
Fraser Cormack
e5d5503e4e
[libclc] Move hypot to CLC library; optimize (#129551)
This was already nominally in the CLC library; this commit just formally
moves it over. It simultaneously optimizes it for vector types by
avoiding scalarization.
2025-03-04 14:16:16 +00:00
Fraser Cormack
1357279df9
[libclc] Move rsqrt to the CLC library (#129045)
This also adds missing half variants to certain targets.

It also optimizes some targets' implementations to perform the operation
directly in vector types, as opposed to scalarizing.
2025-02-27 15:46:58 +00:00
Fraser Cormack
285b411e46
[libclc] Move sqrt to CLC library (#128748)
This is fairly straightforward for most targets.

We use the element-wise sqrt builtin by default. We also remove a legacy
pre-filtering of the input argument, which the intrinsic now officially
handles.

AMDGPU provides its own implementation of sqrt for double types. This
commit moves this into the implementation of CLC sqrt. It uses weak
linkage on the 'default' CLC sqrt to allow AMDGPU to only override the
builtin for the types it cares about.
2025-02-27 12:30:24 +00:00
Fraser Cormack
5f4d1f7400
[libclc] Make CLC library warning-free (#128864)
There is a long-standing workaround in the libclc build system that
silences a warning about the use of parentheses in bitwise conditional
operations.

In an effort to remove this workaround, this commit re-enables the
warning on the internal CLC library, where most of the bodies of the
builtins will eventually be defined. Thus as we move builtin
implementations into this library, the warnings will trigger and we can
clean up the codebase as we go.

As it happens the only instance in the CLC library which triggered the
warning was in __clc_ldexp.
2025-02-26 12:11:26 +00:00
Fraser Cormack
d5038b3774
[libclc] Move __clc_ldexp to CLC library (#126078)
This function was already conceptually in the CLC namespace - this just
formally moves it over.

Note however that this commit marks a change in how libclc functions may
be overridden by targets.

Until now we have been using a purely build-system-based approach where
targets could register identically-named files which took responsibility
for the implementation of the builtin in its entirety.

This system wasn't well equipped to deal with AMD's overriding of
__clc_ldexp for only a subset of types, and furthermore conditionally on
a pre-defined macro.

One option for handling this would be to require AMD to duplicate code
for the versions of __clc_ldexp it's *not* interested in overriding. We
could also make it easier for targets to re-define CLC functions through
macros or .inc files. Both of these have obvious downsides. We could
also keep AMD's overriding in the OpenCL layer and bypass CLC
altogether, but this has limited use.

We could use weak linkage on the "base" implementations of CLC
functions, and allow targets to opt-in to providing their own
implementations on a much finer granularity. This commit supports this
as a proof of concept; we could expand it to all CLC builtins if
accepted.

Note that the existing filename-based "claiming" approach is still in
effect, so targets have to name their overrides differently to have both
files compiled. This could also be refined.
2025-02-26 11:20:25 +00:00
Fraser Cormack
a821ae2847
[libclc] Move round to CLC library (#128721) 2025-02-25 16:24:57 +00:00
Fraser Cormack
f8948d3c47
[libclc] Move log/log2/log10 to CLC library (#128540)
This commit also enables fp16 log, which was previously missing.

Other than that, no changes to codegen for AMDGPU/Nvidia targets.

Note that for simplicity this commit doesn't try to refactor or optimize
the implementations. Notably, each log is only implementated for scalar
types; vector types are scalarized. It doesn't look too difficult to
make the implementations suitable for vector codegen, so I'll try that
in a future commit.

There's also an unused implementation of log in clc_log_base.h, whereas
the implementation currently used by libclc targets re-uses log2 with an
additional multiplication. That should also be cleaned up as on first
inspection it looks a more optimal implementation, though it would have
to be checked against the OpenCL CTS for good measure.
2025-02-25 11:44:59 +00:00
Fraser Cormack
2dfb29a9b2
[libclc] Move nan to the CLC library (#128521) 2025-02-24 15:41:31 +00:00
Fraser Cormack
e7ad07ffb8
[libclc] Move fma to the CLC library (#126052)
This builtin is a little more involved than others as targets deal with
fma in various different ways.

Fundamentally, the CLC __clc_fma builtin compiles to
__builtin_elementwise_fma, which compiles to the @llvm.fma intrinsic.
However, in the case of fp32 fma some targets call the __clc_sw_fma
function, which provides a software implementation of the builtin. This
in principle is controlled by the __CLC_HAVE_HW_FMA32 macro and may be a
runtime decision, depending on how the target defines that macro.

All targets build the CLC fma functions for all types. This is to the
CLC library can have a reliable internal implementation for its own
purposes.

For AMD/NVPTX targets there are no meaningful changes to the generated
LLVM bytecode. Some blocks of code have moved around, which confounds
llvm-diff.

For the clspv and SPIR-V/Mesa targets, only fp32 fma is of interest. Its
use in libclc is tightly controlled by checking __CLC_HAVE_HW_FMA32
first. This can either be a compile-time constant (1, for clspv) or a
runtime function for SPIR-V/Mesa.

The SPIR-V/Mesa target only provided fp32 fma in the OpenCL layer. It
unconditionally mapped that to the __clc_sw_fma builtin, even though the
generic version in theory had a runtime toggle through
__CLC_HAVE_HW_FMA32 specifically for that target. Callers of fma,
though, would end up using the ExtInst fma, *not* calling the _Z3fmafff
function provided by libclc.

This commit keeps this system in place in the OpenCL layer, by mapping
fma to __clc_sw_fma. Where other builtins would previously call fma
(i.e., result in the ExtInst), they now call __clc_fma. This function
checks the __CLC_HAVE_HW_FMA32 runtime toggle, which selects between the
slow version or the quick version. The quick version is the LLVM fma
intrinsic which llvm-spirv translates to the ExtInst.

The clspv target had its own software implementation of fp32 fma, which
it called unconditionally - even though __CLC_HAVE_HW_FMA32 is 1 for
that target. This is potentially just so its library ships a software
version which it can fall back on. In the OpenCL layer, the target
doesn't provide fp64 fma, and maps fp16 fma to fp32 mad.

This commit keeps this system roughly in place: in the OpenCL layer it
maps fp32 fma to __clc_sw_fma, and fp16 fma to mad. Where builtins would
previously call into fma, they now call __clc_fma, which compiles to the
LLVM intrinsic. If this goes through a translation to SPIR-V it will
become the fma ExtInst, or the intrinsic could be replaced by the
_Z3fmafff software implementation.

The clspv and SPIR-V/Mesa targets could potentially be cleaned up later,
depending on their needs.
2025-02-24 10:10:51 +00:00
Fraser Cormack
ae5785460d
[libclc] Define macros for users of gentype.inc (#128012)
Several users of (mostly math/) gentype.inc rely on types other than the
'gentype'. This is commonly intN as several maths builtins expose this
as a return or paramter type. We were previously explicitly defining
this type for every gentype.

Other implementations rely on integer types of the same size and element
width as the gentype, such as short/ushort for half, long/ulong for
double, etc.

Users might also rely on as_type or convert_type builtins to/from these
types.

The previous method we used to define intN was unscalable if we wanted
to expose more types and helpers.

This commit introduces a simpler system whereby several macros are
defined at the beginning of gentype.inc. These rely on concatenating
with the vector size. To facilitate this system, scalar gentypes now
define an empty vector size. It was previously undefined, which was
dangerous. An added benefit is that it matches how the integer
gentype.inc vector size has been working.

These macros will be especially helpful for the definitions of
logb/ilogb in an upcoming patch.
2025-02-20 15:24:04 +00:00
Fraser Cormack
684ad25dfc
[libclc] Move frexp to CLC library; optimize half vecs (#127836)
This commit moves the frexp builtin to the CLC library.

It simultaneously optimizes the code generated for half vectors, which
was previously scalarizing and casting up to float. With this commit it
still casts up to float, but keeps it in the vector form.
2025-02-20 08:41:45 +00:00
Fraser Cormack
079115e6ea
[libclc] Move modf to the CLC library (#127828)
The "generic" unary_(def|decl)_with_ptr files are intended to be re-used
by the sincos and fract builtins in the future as they share an
identical type signature.
2025-02-20 08:36:46 +00:00
Fraser Cormack
1509b46ea5
[libclc] Improve nextafter behaviour around zero (#127469)
This commit improves the behaviour of (__clc_)nextafter around zero.
Specifically, the nextafter value of very small negative numbers in the
positive direction is now negative zero. Previously we'd return positive
zero.

This behaviour is not required as far as OpenCL is concerned: at least,
the CTS isn't testing for it. However, this change does bring our
implementation into bit-equivalence with (libstdc++'s implementation of)
std::nextafter, tested on all possible values of 32-bit float towards
both positive and negative INFINITY.

Furthermore, since the implementation of libclc's floating-point 'rtp'
and 'rtz' conversions use __clc_nextafter, the previous behaviour was
resulting in CTS validation issues. For example, when converting float
-0x1.000002p-25 to half, rounding towards zero or positive infinity,
nextafter was returning +0.0, whereas the correct conversion requires us
to return -0.0.

We could work around this issue in the conversion functions, but since
the change to nextafter is small enough and the behaviour around zero
matches libstdc++, the fix feels at home there.

This commit also converts several variables to unsigned types to avoid
undefined behaviour surrounding signed underflow on the subtractions.
It also converts some variables to be kept in floating-point types, using
fabs to get the absolute value rather than by bit-hacking.
2025-02-19 10:24:12 +00:00
Fraser Cormack
378c6fbe33 [libclc][NFC] Rename macro; undef at end of file 2025-02-18 14:56:25 +00:00
Fraser Cormack
df12bad075
[libclc] Use CLC conversion builtins in CLC functions (#127628)
This commit is a broad update across libclc to use the CLC conversion
builtins in CLC functions, even those with a '__clc' prefix in the
generic folder. This better prepares them for an official move to the
CLC library in time.

The CLC conversion builtins have an additional benefit in that they
support scalars, unlike the __builtin_convertvector builtin which we
were using previously. This allows us to simplify some shared
definitions.

There is one change to the IR, in the scalar upsample(char, uchar)
builtin. It now sign-extends the first argument to i16, where before it
zero-extended it. This appears to be correct, and matches the vector
behaviour.
2025-02-18 14:52:41 +00:00
Fraser Cormack
25c0554166
[libclc] Move conversion builtins to the CLC library (#124727)
This commit moves the implementations of conversion builtins to the CLC
library. It keeps the dichotomy of regular vs. clspv implementations of
the conversions. However, for the sake of a consistent interface all CLC
conversion routines are built, even the ones that clspv opts out of in
the user-facing OpenCL layer.

It simultaneously updates the python script to use f-strings for
formatting.
2025-02-12 08:55:02 +00:00
Fraser Cormack
64735ad639
[libclc] Move sign to the CLC builtins library (#115699)
This commit moves the sign builtin's implementation to the CLC library.
It simultaneously optimizes it (for vector types) by removing
control-flow from the implementation.

The __CLC_INTERNAL preprocessor definition has been repurposed (without
the leading underscores) to be passed when building the internal CLC
library. It was only used in one other place to guard an extra maths
preprocessor definition, which we can do unconditionally.
2025-02-11 11:14:49 +00:00
Fraser Cormack
4dec3909e9
[libclc] Have all targets build all CLC functions (#124779)
This removes all remaining SPIR-V workarounds for CLC functions, in an
effort to streamline the CLC implementation and prevent further issues
that #124614 had to fix. This commit fixes the same issue for the SPIR-V
targets.

Target-specific CLC implementations can and will exist, but for now
they're all identical and so the target-specific SOURCES files have been
removed. Target implementations now always include the 'generic' CLC
directory, meaning we can avoid unnecessary duplication of SOURCES
listings.
2025-02-10 10:19:22 +00:00
Fraser Cormack
76d1cb22c1
[libclc] Move rotate to CLC library; optimize (#125713)
This commit moves the rotate builtin to the CLC library.

It also optimizes rotate(x, n) to generate the @llvm.fshl(x, x, n)
intrinsic, for both scalar and vector types. The previous implementation
was too cautious in its handling of the shift amount; the OpenCL rules
state that the shift amount is always treated as an unsigned value
modulo the bitwidth.
2025-02-05 10:38:23 +00:00
Fraser Cormack
fe694b18dc
[libclc] Move mad_sat to CLC; optimize for vector types (#125517)
This commit moves the mad_sat builtin to the CLC library.

It also optimizes it for vector types by avoiding scalarization. To help
do this it transforms the previous control-flow code into vector select
code. This has also been done for the scalar versions for simplicity.
2025-02-03 17:50:42 +00:00
Fraser Cormack
0e2abe7de3
[libclc] Remove use of symlinks (#125069)
Symlinks are problematic on some systems. They aren't strictly necessary
as we already have build infrastructure to 'alias' multiple targets'
source directories together, as nvptx/nvptx64 has been doing.

This commit takes the opportunity to merge together the spirv and
spirv64 directories through the same system as they were identical.

Fixes #114413
2025-01-30 17:44:07 +00:00
Fraser Cormack
7441e87fe0
[libclc] Move several integer functions to CLC library (#116786)
This commit moves over the OpenCL clz, hadd, mad24, mad_hi, mul24,
mul_hi, popcount, rhadd, and upsample builtins to the CLC library.

This commit also optimizes the vector forms of the mul_hi and upsample
builtins to consistently remain in vector types, instead of recursively
splitting vectors down to the scalar form.

The OpenCL mad_hi builtin wasn't previously publicly available from the
CLC libraries, as it was hash-defined to mul_hi in the header files.
That issue has been fixed, and mad_hi is now exposed.

The custom AMD implementation/workaround for popcount has been removed
as it was only required for clang < 7.

There are still two integer functions which haven't been moved over. The
OpenCL mad_sat builtin uses many of the other integer builtins, and
would benefit from optimization for vector types. That can take place in
a follow-up commit. The rotate builtin could similarly use some more
dedicated focus, potentially using clang builtins.
2025-01-29 13:45:33 +00:00
Fraser Cormack
12cdf4330d
[libclc] Move (add|sub)_sat to CLC; optimize (#124903)
Using the `__builtin_elementwise_(add|sub)_sat` functions allows us to
directly optimize to the desired intrinsic, and avoid scalarization for
vector types.
2025-01-29 11:12:40 +00:00
Fraser Cormack
a8c82d5fde
[libclc] Optimize isfpclass-like CLC builtins (#124145)
The builtins we were using to implement __clc_is(finite|inf|nan|normal)
-- __builtin_isfinite, etc. -- don't take vector types so we were
previously scalarizing. The __builtin_isfpclass builtin does take vector
types and thus allows us to keep things in vectors.

There is no change in codegen to the scalar versions of any of these
builtins.
2025-01-28 16:23:52 +00:00
Romaric Jodin
9d8d538e40
libclc: clspv: add missing clc_isnan.cl dependency (#124614)
clc_isnan.cl is needed since
https://github.com/llvm/llvm-project/pull/124097
2025-01-28 14:47:08 +00:00
Fraser Cormack
78b5bb702f
[libclc][NFC] Move key math headers to CLC (#124739) 2025-01-28 14:17:23 +00:00
Fraser Cormack
cfc8ef0ad8
[libclc] Move copysign to CLC library; fix & optimize (#124598)
This commit moves the implementation of the copysign builtin to the CLC
library.

It simultaneously optimizes it for vector types by avoiding
scalarization. It does so by using the __builtin_elementwise_copysign
clang builtins, which can handle vector types.

It also fixes a bug in the half/fp16 implementation of the builtin. This
version was using an incorrect mask (0x7FFFF instead of 0x7FFF) and was
thus preserving the original sign bit, rather than masking it out.
2025-01-28 09:18:34 +00:00
Fraser Cormack
c3a0fcc982
[libclc] Optimize CLC vector any/all builtins (#124568)
By using the vector reduction buitins we can avoid scalarization.
Targets that don't support vector reductions will scalarize later on
anyway. The vector reduction builtins should be well-enough supported by
the middle-end to be a generic solution.

This produces conceptually equivalent code: all vector elements are
OR'd/AND'd together and the final scalar is bit-shifted and masked to
produce the final result.

The 'normalize' builtin uses 'all' so its code has similarly improved in
places.
2025-01-27 16:37:21 +00:00
Fraser Cormack
eaa5897534
[libclc] Optimize CLC vector is(un)ordered builtins (#124546)
These are similar to 347fb208, but these builtins are expressed in terms
of other builtins. The LLVM IR generated features the same fcmp ord/uno
comparisons as before, but consistently in vector form.
2025-01-27 14:41:40 +00:00
Fraser Cormack
347fb208c1
[libclc] Optimize CLC vector relational builtins (#124537)
Clang knows how to perform relational operations on OpenCL vectors, so
we don't need to use the Clang builtins. The builtins we were using
didn't support vector types, so we were previously scalarizing.

This commit generates the same LLVM fcmp operations as before, just
without the scalarization.
2025-01-27 13:25:37 +00:00
Fraser Cormack
9705500582
[libclc] Move nextafter to the CLC library (#124097)
There were two implementations of this - one that implemented nextafter
in software, and another that called a clang builtin. No in-tree targets
called the builtin, so all targets build the software version. The
builtin version has been removed, and the software version has been
renamed to be the "default".

This commit also optimizes nextafter, to avoid scalarization as much as
possible. Note however that the (CLC) relational builtins still
scalarize; those will be optimized in a separate commit.

Since nextafter is used by some convert_type builtins, the diff to IR
codegen is not limited to the builtin itself.
2025-01-23 12:24:16 +00:00
Fraser Cormack
9e0b2b68c2
[libclc] Don't rely on fp16 pragma guards in headers (#122751)
Having the fp16 pragmas enabled in the header file is risky. The macros
defined by that header don't (and can't) include the pragmas that make
fp16 types themselves legal, and another header may disable the fp16
pragma before the macro's use.

The safest thing to do is the use of pragmas surrounding each use of the
macro in the implementation files. This pattern is also far more common
across the codebase.
2025-01-22 09:32:20 +00:00
Fraser Cormack
eaf3e1b0d1
[libclc] Route int bitselect through CLC; add half (#123653)
The half variants were missing. The integer bitselect builtins weren't
going through __clc_bitselect due to an oversight when the CLC version
was introduced.
2025-01-21 10:09:25 +00:00
Fraser Cormack
d96ec48068
[libclc] Route select through __clc_select (#123647)
This was missed during the introduction of select. This also unifies the
various .inc files used for each, as they were essentially identical.

The __clc_select function is now also built for SPIR-V targets.
2025-01-21 10:05:39 +00:00
Fraser Cormack
c8eb865747
[libclc] Move mad to the CLC library (#123607)
All targets build `__clc_mad` -- even SPIR-V targets -- since it
compiles to the optimal `llvm.fmuladd` intrinsic. There is no change to
the bytecode generated for non-SPIR-V targets.

The `mix` builtin, which is implemented as a wrapper around `mad`, is
left as an OpenCL-layer wrapper of `__clc_mad`. I don't know if it's
worth having a specific CLC version of `mix`.

The changes to the other CLC files/functions are moving uses of `mad` to
`__clc_mad`, and reformatting. There is an additional instance of
`trunc` becoming `__clc_trunc`, which was missed before.
2025-01-20 16:27:51 +00:00
Fraser Cormack
8b7bfb417a [libclc] Rename include guards. NFC. 2025-01-20 11:26:02 +00:00
Fraser Cormack
a90b5b1885
[libclc] Move degrees/radians to CLC library & optimize (#123222)
Missing half variants were also added.

The builtins are now consistently emitted in vector form (i.e., with a
splat of the literal to the appropriate vector size).
2025-01-17 12:11:53 +00:00
Fraser Cormack
b7e20147ad
[libclc] Move smoothstep to CLC and optimize its codegen (#123183)
This commit moves the implementation of the smoothstep function to the
CLC library, whilst optimizing the codegen.

This commit also adds support for 'half' versions of smoothstep, which
were previously missing.

The CLC smoothstep implementation now keeps everything in vectors,
rather than recursively splitting vectors by half down to the scalar
base form. This should result in more optimal codegen across the board.

This commit also removes some non-standard overloads of smoothstep with
mixed types, such as 'double smoothstep(float, float, float)'. There
aren't any mixed-(element )type versions of smoothstep as far as I can
see:

    gentype smoothstep(gentype edge0, gentype edge1, gentype x)
    gentypef smoothstep(float edge0, float edge1, gentypef x)
    gentyped smoothstep(double edge0, double edge1, gentyped x)
    gentypeh smoothstep(half edge0, half edge1, gentypeh x)

The CLC library only defines the first type, for simplicity; the OpenCL
layer is responsible for handling the scalar/scalar/vector forms. Note
that the scalar/scalar/vector forms now splat the scalars to the vector
type, rather than recursively split vectors as before. The macro that
used to 'vectorize' smoothstep in this way has been moved out of the
shared clcmacro.h header as it was only used for the smoothstep builtin.

Note that the CLC clamp function is now built for both SPIR-V targets.
This is to help build the CLC smoothstep function for the Mesa SPIR-V
target.
2025-01-16 11:44:09 +00:00
Fraser Cormack
a5b88cb815
[libclc] Add missing includes to CLC headers (#118654)
There's no automatic way of checking these headers are self-contained.

Instead of including these common files many times across the whole
codebase, we can include them in the generic `gentype.inc` and
`floatn.inc` files which are included by most CLC headers.
2025-01-15 10:14:51 +00:00
Fraser Cormack
06789ccb16
[libclc] Optimize ceil/fabs/floor/rint/trunc (#119596)
These functions all map to the corresponding LLVM intrinsics, but the
vector intrinsics weren't being generated. The intrinsic mapping from
CLC vector function to vector intrinsic was working correctly, but the
mapping from OpenCL builtin to CLC function was suboptimally recursively
splitting vectors in halves.

For example, with this change, `ceil(float16)` calls `llvm.ceil.v16f32`
directly once optimizations are applied.

Now also, instead of generating LLVM intrinsics through `__asm` we now
call clang elementwise builtins for each CLC builtin. This should be a
more standard way of achieving the same result

The CLC versions of each of these builtins are also now built and
enabled for SPIR-V targets. The LLVM -> SPIR-V translator maps the
intrinsics to the appropriate OpExtInst, so there should be no
difference in semantics, despite the newly introduced indirection from
OpenCL builtin through the CLC builtin to the intrinsic.

The AMDGPU targets make use of the same `_CLC_DEFINE_UNARY_BUILTIN`
macro to override `sqrt`, so those functions also appear more optimal
with this change, calling the vector `llvm.sqrt.vXf32` intrinsics
directly.
2024-12-13 08:47:13 +00:00
Fraser Cormack
7387338007 [libclc] Add some include guards to CLC declarations. NFC 2024-11-12 17:25:40 +00:00
Fraser Cormack
b231647475
[libclc] Move relational functions to the CLC library (#115171)
The OpenCL relational functions now call their CLC counterparts, and the
CLC relational functions are defined identically to how the OpenCL
functions were defined.

As usual, clspv and spir-v targets bypass these.

No observable changes to any libclc target (measured with llvm-diff).
2024-11-06 19:28:44 +00:00
Fraser Cormack
7be30fd533 [libclc] Move abs/abs_diff to CLC library 2024-11-06 09:16:35 +00:00
Fraser Cormack
d2d1b5897e
[libclc] Move clcmacro.h to CLC library. NFC (#114845) 2024-11-04 22:00:01 +00:00
Fraser Cormack
293c78ba0a
[libclc] Move ceil/fabs/floor/rint/trunc to CLC library (#114774)
These functions are all mapped to LLVM intrinsics.

The clspv and spirv targets don't declare or define any of these CLC
functions, and instead map these to their corresponding OpenCL symbols.
2024-11-04 16:35:14 +00:00
Fraser Cormack
d12a8da1de
[libclc] Move min/max/clamp into the CLC builtins library (#114386)
These functions are "shared" between integer and floating-point types,
hence the directory name. They are used in several CLC internal
functions such as __clc_ldexp.

Note that clspv and spirv targets don't want to define these functions,
so pre-processor macros replace calls to __clc_min with regular min, for
example. This means they can use as much of the generic CLC source files
as possible, but where CLC functions would usually call out to an
external __clc_min symbol, they call out to an external min symbol. Then
they opt out of defining __clc_min itself in their CLC builtins library.

Preprocessor definitions for these targets have also been changed
somewhat: what used to be CLC_SPIRV (the 32-bit target) is now
CLC_SPIRV32, and CLC_SPIRV now represents either CLC_SPIRV32 or
CLC_SPIRV64. Same goes for CLC_CLSPV.

There are no differences (measured with llvm-diff) in any of the final
builtins libraries for nvptx, amdgpu, or clspv. Neither are there
differences in the SPIR-V targets' LLVM IR before it's actually lowered
to SPIR-V.
2024-10-31 16:45:37 +00:00
Fraser Cormack
b2bdd8bd39 [libclc] Create an internal 'clc' builtins library
Some libclc builtins currently use internal builtins prefixed with
'__clc_' for various reasons, e.g., to avoid naming clashes.

This commit formalizes this concept by starting to isolate the
definitions of these internal clc builtins into a separate
self-contained bytecode library, which is linked into each target's
libclc OpenCL builtins before optimization takes place.

The goal of this step is to allow additional libraries of builtins
that provide entry points (or bindings) that are not written in OpenCL C
but still wish to expose OpenCL-compatible builtins. By moving the
implementations into a separate self-contained library, entry points can
share as much code as possible without going through OpenCL C.

The overall structure of the internal clc library is similar to the
current OpenCL structure, with SOURCES files and targets being able to
override the definitions of builtins as needed. The idea is that the
OpenCL builtins will begin to need fewer target-specific overrides, as
those will slowly move over to the clc builtins instead.

Another advantage of having a separate bytecode library with the CLC
implementations is that we can internalize the symbols when linking it
(separately), whereas currently the CLC symbols make it into the final
builtins library (and perhaps even the final compiled binary).

This patch starts of with 'dot' as it's relatively self-contained, as
opposed to most of the maths builtins which tend to pull in other
builtins.

We can also start to clang-format the builtins as we go, which should
help to modernize the codebase.
2024-10-29 13:09:56 +00:00