llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-24 14:36:07 +00:00

Author	SHA1	Message	Date
Fraser Cormack	a8c82d5fde	[libclc] Optimize isfpclass-like CLC builtins (#124145 ) The builtins we were using to implement __clc_is(finite\|inf\|nan\|normal) -- __builtin_isfinite, etc. -- don't take vector types so we were previously scalarizing. The __builtin_isfpclass builtin does take vector types and thus allows us to keep things in vectors. There is no change in codegen to the scalar versions of any of these builtins.	2025-01-28 16:23:52 +00:00
Romaric Jodin	9d8d538e40	libclc: clspv: add missing clc_isnan.cl dependency (#124614 ) clc_isnan.cl is needed since https://github.com/llvm/llvm-project/pull/124097	2025-01-28 14:47:08 +00:00
Fraser Cormack	78b5bb702f	[libclc][NFC] Move key math headers to CLC (#124739 )	2025-01-28 14:17:23 +00:00
Fraser Cormack	cfc8ef0ad8	[libclc] Move copysign to CLC library; fix & optimize (#124598 ) This commit moves the implementation of the copysign builtin to the CLC library. It simultaneously optimizes it for vector types by avoiding scalarization. It does so by using the __builtin_elementwise_copysign clang builtins, which can handle vector types. It also fixes a bug in the half/fp16 implementation of the builtin. This version was using an incorrect mask (0x7FFFF instead of 0x7FFF) and was thus preserving the original sign bit, rather than masking it out.	2025-01-28 09:18:34 +00:00
Fraser Cormack	c3a0fcc982	[libclc] Optimize CLC vector any/all builtins (#124568 ) By using the vector reduction buitins we can avoid scalarization. Targets that don't support vector reductions will scalarize later on anyway. The vector reduction builtins should be well-enough supported by the middle-end to be a generic solution. This produces conceptually equivalent code: all vector elements are OR'd/AND'd together and the final scalar is bit-shifted and masked to produce the final result. The 'normalize' builtin uses 'all' so its code has similarly improved in places.	2025-01-27 16:37:21 +00:00
Fraser Cormack	eaa5897534	[libclc] Optimize CLC vector is(un)ordered builtins (#124546 ) These are similar to 347fb208, but these builtins are expressed in terms of other builtins. The LLVM IR generated features the same fcmp ord/uno comparisons as before, but consistently in vector form.	2025-01-27 14:41:40 +00:00
Fraser Cormack	347fb208c1	[libclc] Optimize CLC vector relational builtins (#124537 ) Clang knows how to perform relational operations on OpenCL vectors, so we don't need to use the Clang builtins. The builtins we were using didn't support vector types, so we were previously scalarizing. This commit generates the same LLVM fcmp operations as before, just without the scalarization.	2025-01-27 13:25:37 +00:00
Fraser Cormack	9705500582	[libclc] Move nextafter to the CLC library (#124097 ) There were two implementations of this - one that implemented nextafter in software, and another that called a clang builtin. No in-tree targets called the builtin, so all targets build the software version. The builtin version has been removed, and the software version has been renamed to be the "default". This commit also optimizes nextafter, to avoid scalarization as much as possible. Note however that the (CLC) relational builtins still scalarize; those will be optimized in a separate commit. Since nextafter is used by some convert_type builtins, the diff to IR codegen is not limited to the builtin itself.	2025-01-23 12:24:16 +00:00
Fraser Cormack	9e0b2b68c2	[libclc] Don't rely on fp16 pragma guards in headers (#122751 ) Having the fp16 pragmas enabled in the header file is risky. The macros defined by that header don't (and can't) include the pragmas that make fp16 types themselves legal, and another header may disable the fp16 pragma before the macro's use. The safest thing to do is the use of pragmas surrounding each use of the macro in the implementation files. This pattern is also far more common across the codebase.	2025-01-22 09:32:20 +00:00
Fraser Cormack	eaf3e1b0d1	[libclc] Route int bitselect through CLC; add half (#123653 ) The half variants were missing. The integer bitselect builtins weren't going through __clc_bitselect due to an oversight when the CLC version was introduced.	2025-01-21 10:09:25 +00:00
Fraser Cormack	d96ec48068	[libclc] Route select through __clc_select (#123647 ) This was missed during the introduction of select. This also unifies the various .inc files used for each, as they were essentially identical. The __clc_select function is now also built for SPIR-V targets.	2025-01-21 10:05:39 +00:00
Fraser Cormack	c8eb865747	[libclc] Move mad to the CLC library (#123607 ) All targets build `__clc_mad` -- even SPIR-V targets -- since it compiles to the optimal `llvm.fmuladd` intrinsic. There is no change to the bytecode generated for non-SPIR-V targets. The `mix` builtin, which is implemented as a wrapper around `mad`, is left as an OpenCL-layer wrapper of `__clc_mad`. I don't know if it's worth having a specific CLC version of `mix`. The changes to the other CLC files/functions are moving uses of `mad` to `__clc_mad`, and reformatting. There is an additional instance of `trunc` becoming `__clc_trunc`, which was missed before.	2025-01-20 16:27:51 +00:00
Fraser Cormack	8b7bfb417a	[libclc] Rename include guards. NFC.	2025-01-20 11:26:02 +00:00
Fraser Cormack	a90b5b1885	[libclc] Move degrees/radians to CLC library & optimize (#123222 ) Missing half variants were also added. The builtins are now consistently emitted in vector form (i.e., with a splat of the literal to the appropriate vector size).	2025-01-17 12:11:53 +00:00
Fraser Cormack	b7e20147ad	[libclc] Move smoothstep to CLC and optimize its codegen (#123183 ) This commit moves the implementation of the smoothstep function to the CLC library, whilst optimizing the codegen. This commit also adds support for 'half' versions of smoothstep, which were previously missing. The CLC smoothstep implementation now keeps everything in vectors, rather than recursively splitting vectors by half down to the scalar base form. This should result in more optimal codegen across the board. This commit also removes some non-standard overloads of smoothstep with mixed types, such as 'double smoothstep(float, float, float)'. There aren't any mixed-(element )type versions of smoothstep as far as I can see: gentype smoothstep(gentype edge0, gentype edge1, gentype x) gentypef smoothstep(float edge0, float edge1, gentypef x) gentyped smoothstep(double edge0, double edge1, gentyped x) gentypeh smoothstep(half edge0, half edge1, gentypeh x) The CLC library only defines the first type, for simplicity; the OpenCL layer is responsible for handling the scalar/scalar/vector forms. Note that the scalar/scalar/vector forms now splat the scalars to the vector type, rather than recursively split vectors as before. The macro that used to 'vectorize' smoothstep in this way has been moved out of the shared clcmacro.h header as it was only used for the smoothstep builtin. Note that the CLC clamp function is now built for both SPIR-V targets. This is to help build the CLC smoothstep function for the Mesa SPIR-V target.	2025-01-16 11:44:09 +00:00
Fraser Cormack	a5b88cb815	[libclc] Add missing includes to CLC headers (#118654 ) There's no automatic way of checking these headers are self-contained. Instead of including these common files many times across the whole codebase, we can include them in the generic `gentype.inc` and `floatn.inc` files which are included by most CLC headers.	2025-01-15 10:14:51 +00:00
Fraser Cormack	06789ccb16	[libclc] Optimize ceil/fabs/floor/rint/trunc (#119596 ) These functions all map to the corresponding LLVM intrinsics, but the vector intrinsics weren't being generated. The intrinsic mapping from CLC vector function to vector intrinsic was working correctly, but the mapping from OpenCL builtin to CLC function was suboptimally recursively splitting vectors in halves. For example, with this change, `ceil(float16)` calls `llvm.ceil.v16f32` directly once optimizations are applied. Now also, instead of generating LLVM intrinsics through `__asm` we now call clang elementwise builtins for each CLC builtin. This should be a more standard way of achieving the same result The CLC versions of each of these builtins are also now built and enabled for SPIR-V targets. The LLVM -> SPIR-V translator maps the intrinsics to the appropriate OpExtInst, so there should be no difference in semantics, despite the newly introduced indirection from OpenCL builtin through the CLC builtin to the intrinsic. The AMDGPU targets make use of the same `_CLC_DEFINE_UNARY_BUILTIN` macro to override `sqrt`, so those functions also appear more optimal with this change, calling the vector `llvm.sqrt.vXf32` intrinsics directly.	2024-12-13 08:47:13 +00:00
Fraser Cormack	7387338007	[libclc] Add some include guards to CLC declarations. NFC	2024-11-12 17:25:40 +00:00
Fraser Cormack	b231647475	[libclc] Move relational functions to the CLC library (#115171 ) The OpenCL relational functions now call their CLC counterparts, and the CLC relational functions are defined identically to how the OpenCL functions were defined. As usual, clspv and spir-v targets bypass these. No observable changes to any libclc target (measured with llvm-diff).	2024-11-06 19:28:44 +00:00
Fraser Cormack	7be30fd533	[libclc] Move abs/abs_diff to CLC library	2024-11-06 09:16:35 +00:00
Fraser Cormack	d2d1b5897e	[libclc] Move clcmacro.h to CLC library. NFC (#114845 )	2024-11-04 22:00:01 +00:00
Fraser Cormack	293c78ba0a	[libclc] Move ceil/fabs/floor/rint/trunc to CLC library (#114774 ) These functions are all mapped to LLVM intrinsics. The clspv and spirv targets don't declare or define any of these CLC functions, and instead map these to their corresponding OpenCL symbols.	2024-11-04 16:35:14 +00:00
Fraser Cormack	d12a8da1de	[libclc] Move min/max/clamp into the CLC builtins library (#114386 ) These functions are "shared" between integer and floating-point types, hence the directory name. They are used in several CLC internal functions such as __clc_ldexp. Note that clspv and spirv targets don't want to define these functions, so pre-processor macros replace calls to __clc_min with regular min, for example. This means they can use as much of the generic CLC source files as possible, but where CLC functions would usually call out to an external __clc_min symbol, they call out to an external min symbol. Then they opt out of defining __clc_min itself in their CLC builtins library. Preprocessor definitions for these targets have also been changed somewhat: what used to be CLC_SPIRV (the 32-bit target) is now CLC_SPIRV32, and CLC_SPIRV now represents either CLC_SPIRV32 or CLC_SPIRV64. Same goes for CLC_CLSPV. There are no differences (measured with llvm-diff) in any of the final builtins libraries for nvptx, amdgpu, or clspv. Neither are there differences in the SPIR-V targets' LLVM IR before it's actually lowered to SPIR-V.	2024-10-31 16:45:37 +00:00
Fraser Cormack	b2bdd8bd39	[libclc] Create an internal 'clc' builtins library Some libclc builtins currently use internal builtins prefixed with '__clc_' for various reasons, e.g., to avoid naming clashes. This commit formalizes this concept by starting to isolate the definitions of these internal clc builtins into a separate self-contained bytecode library, which is linked into each target's libclc OpenCL builtins before optimization takes place. The goal of this step is to allow additional libraries of builtins that provide entry points (or bindings) that are not written in OpenCL C but still wish to expose OpenCL-compatible builtins. By moving the implementations into a separate self-contained library, entry points can share as much code as possible without going through OpenCL C. The overall structure of the internal clc library is similar to the current OpenCL structure, with SOURCES files and targets being able to override the definitions of builtins as needed. The idea is that the OpenCL builtins will begin to need fewer target-specific overrides, as those will slowly move over to the clc builtins instead. Another advantage of having a separate bytecode library with the CLC implementations is that we can internalize the symbols when linking it (separately), whereas currently the CLC symbols make it into the final builtins library (and perhaps even the final compiled binary). This patch starts of with 'dot' as it's relatively self-contained, as opposed to most of the maths builtins which tend to pull in other builtins. We can also start to clang-format the builtins as we go, which should help to modernize the codebase.	2024-10-29 13:09:56 +00:00

24 Commits