These functions all map to the corresponding LLVM intrinsics, but the
vector intrinsics weren't being generated. The intrinsic mapping from
CLC vector function to vector intrinsic was working correctly, but the
mapping from OpenCL builtin to CLC function was suboptimally recursively
splitting vectors in halves.
For example, with this change, `ceil(float16)` calls `llvm.ceil.v16f32`
directly once optimizations are applied.
Now also, instead of generating LLVM intrinsics through `__asm` we now
call clang elementwise builtins for each CLC builtin. This should be a
more standard way of achieving the same result
The CLC versions of each of these builtins are also now built and
enabled for SPIR-V targets. The LLVM -> SPIR-V translator maps the
intrinsics to the appropriate OpExtInst, so there should be no
difference in semantics, despite the newly introduced indirection from
OpenCL builtin through the CLC builtin to the intrinsic.
The AMDGPU targets make use of the same `_CLC_DEFINE_UNARY_BUILTIN`
macro to override `sqrt`, so those functions also appear more optimal
with this change, calling the vector `llvm.sqrt.vXf32` intrinsics
directly.
The OpenCL relational functions now call their CLC counterparts, and the
CLC relational functions are defined identically to how the OpenCL
functions were defined.
As usual, clspv and spir-v targets bypass these.
No observable changes to any libclc target (measured with llvm-diff).
These functions are all mapped to LLVM intrinsics.
The clspv and spirv targets don't declare or define any of these CLC
functions, and instead map these to their corresponding OpenCL symbols.
These functions are "shared" between integer and floating-point types,
hence the directory name. They are used in several CLC internal
functions such as __clc_ldexp.
Note that clspv and spirv targets don't want to define these functions,
so pre-processor macros replace calls to __clc_min with regular min, for
example. This means they can use as much of the generic CLC source files
as possible, but where CLC functions would usually call out to an
external __clc_min symbol, they call out to an external min symbol. Then
they opt out of defining __clc_min itself in their CLC builtins library.
Preprocessor definitions for these targets have also been changed
somewhat: what used to be CLC_SPIRV (the 32-bit target) is now
CLC_SPIRV32, and CLC_SPIRV now represents either CLC_SPIRV32 or
CLC_SPIRV64. Same goes for CLC_CLSPV.
There are no differences (measured with llvm-diff) in any of the final
builtins libraries for nvptx, amdgpu, or clspv. Neither are there
differences in the SPIR-V targets' LLVM IR before it's actually lowered
to SPIR-V.
Some libclc builtins currently use internal builtins prefixed with
'__clc_' for various reasons, e.g., to avoid naming clashes.
This commit formalizes this concept by starting to isolate the
definitions of these internal clc builtins into a separate
self-contained bytecode library, which is linked into each target's
libclc OpenCL builtins before optimization takes place.
The goal of this step is to allow additional libraries of builtins
that provide entry points (or bindings) that are not written in OpenCL C
but still wish to expose OpenCL-compatible builtins. By moving the
implementations into a separate self-contained library, entry points can
share as much code as possible without going through OpenCL C.
The overall structure of the internal clc library is similar to the
current OpenCL structure, with SOURCES files and targets being able to
override the definitions of builtins as needed. The idea is that the
OpenCL builtins will begin to need fewer target-specific overrides, as
those will slowly move over to the clc builtins instead.
Another advantage of having a separate bytecode library with the CLC
implementations is that we can internalize the symbols when linking it
(separately), whereas currently the CLC symbols make it into the final
builtins library (and perhaps even the final compiled binary).
This patch starts of with 'dot' as it's relatively self-contained, as
opposed to most of the maths builtins which tend to pull in other
builtins.
We can also start to clang-format the builtins as we go, which should
help to modernize the codebase.