llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-18 20:36:48 +00:00

Author	SHA1	Message	Date
Fraser Cormack	f8948d3c47	[libclc] Move log/log2/log10 to CLC library (#128540 ) This commit also enables fp16 log, which was previously missing. Other than that, no changes to codegen for AMDGPU/Nvidia targets. Note that for simplicity this commit doesn't try to refactor or optimize the implementations. Notably, each log is only implementated for scalar types; vector types are scalarized. It doesn't look too difficult to make the implementations suitable for vector codegen, so I'll try that in a future commit. There's also an unused implementation of log in clc_log_base.h, whereas the implementation currently used by libclc targets re-uses log2 with an additional multiplication. That should also be cleaned up as on first inspection it looks a more optimal implementation, though it would have to be checked against the OpenCL CTS for good measure.	2025-02-25 11:44:59 +00:00
Matt Arsenault	b57e63b07a	libclc: Stop using asm declarations for r600 on amdgcn for get_global_size (#128692 ) Comparing the case where each dimension is used alone, the only codegen difference is a missed addressing mode fold for the constant offset in the old version due to an ancient bug.	2025-02-25 18:23:04 +07:00
Fraser Cormack	2dfb29a9b2	[libclc] Move nan to the CLC library (#128521 )	2025-02-24 15:41:31 +00:00
Fraser Cormack	aca8a5cb23	[libclc] Remove clspv-specific clc conversions (#128500 ) The clc and clc+clspv modes produced the same conversions code, so this patch simplifies the process. It further simplifies the internal checks the script makes by assuming the mutual exclusivity.	2025-02-24 12:31:55 +00:00
Fraser Cormack	e7ad07ffb8	[libclc] Move fma to the CLC library (#126052 ) This builtin is a little more involved than others as targets deal with fma in various different ways. Fundamentally, the CLC __clc_fma builtin compiles to __builtin_elementwise_fma, which compiles to the @llvm.fma intrinsic. However, in the case of fp32 fma some targets call the __clc_sw_fma function, which provides a software implementation of the builtin. This in principle is controlled by the __CLC_HAVE_HW_FMA32 macro and may be a runtime decision, depending on how the target defines that macro. All targets build the CLC fma functions for all types. This is to the CLC library can have a reliable internal implementation for its own purposes. For AMD/NVPTX targets there are no meaningful changes to the generated LLVM bytecode. Some blocks of code have moved around, which confounds llvm-diff. For the clspv and SPIR-V/Mesa targets, only fp32 fma is of interest. Its use in libclc is tightly controlled by checking __CLC_HAVE_HW_FMA32 first. This can either be a compile-time constant (1, for clspv) or a runtime function for SPIR-V/Mesa. The SPIR-V/Mesa target only provided fp32 fma in the OpenCL layer. It unconditionally mapped that to the __clc_sw_fma builtin, even though the generic version in theory had a runtime toggle through __CLC_HAVE_HW_FMA32 specifically for that target. Callers of fma, though, would end up using the ExtInst fma, not calling the _Z3fmafff function provided by libclc. This commit keeps this system in place in the OpenCL layer, by mapping fma to __clc_sw_fma. Where other builtins would previously call fma (i.e., result in the ExtInst), they now call __clc_fma. This function checks the __CLC_HAVE_HW_FMA32 runtime toggle, which selects between the slow version or the quick version. The quick version is the LLVM fma intrinsic which llvm-spirv translates to the ExtInst. The clspv target had its own software implementation of fp32 fma, which it called unconditionally - even though __CLC_HAVE_HW_FMA32 is 1 for that target. This is potentially just so its library ships a software version which it can fall back on. In the OpenCL layer, the target doesn't provide fp64 fma, and maps fp16 fma to fp32 mad. This commit keeps this system roughly in place: in the OpenCL layer it maps fp32 fma to __clc_sw_fma, and fp16 fma to mad. Where builtins would previously call into fma, they now call __clc_fma, which compiles to the LLVM intrinsic. If this goes through a translation to SPIR-V it will become the fma ExtInst, or the intrinsic could be replaced by the _Z3fmafff software implementation. The clspv and SPIR-V/Mesa targets could potentially be cleaned up later, depending on their needs.	2025-02-24 10:10:51 +00:00
Fraser Cormack	6c2d418027	[libclc] Fix int<->float conversion builtins (#126905 ) While working on moving the conversion builtins to the CLC library in 25c05541 it was discovered that many weren't passing the OpenCL-CTS tests. As it happens, the clspv-specific code for conversion implementations between integer and floating-point types was more correct. However: * The clspv code was generating 'sat' conversions to floating-point types, which are not legal * The clspv code around rtn/rtz conversions needed tweaking as it wasn't validating when sizeof(dst) > sizeof(src), e.g., int -> double. With this commit, the CTS failures seen before have been resolved. This also assumes that the new implementations are correct also for clspv. If this is the case, then 'clc' and 'clspv' modes are mutually exclusive and we can simplify the build process for conversions by not building clc-clspv-convert.cl.	2025-02-24 09:28:51 +00:00
Fraser Cormack	ae5785460d	[libclc] Define macros for users of gentype.inc (#128012 ) Several users of (mostly math/) gentype.inc rely on types other than the 'gentype'. This is commonly intN as several maths builtins expose this as a return or paramter type. We were previously explicitly defining this type for every gentype. Other implementations rely on integer types of the same size and element width as the gentype, such as short/ushort for half, long/ulong for double, etc. Users might also rely on as_type or convert_type builtins to/from these types. The previous method we used to define intN was unscalable if we wanted to expose more types and helpers. This commit introduces a simpler system whereby several macros are defined at the beginning of gentype.inc. These rely on concatenating with the vector size. To facilitate this system, scalar gentypes now define an empty vector size. It was previously undefined, which was dangerous. An added benefit is that it matches how the integer gentype.inc vector size has been working. These macros will be especially helpful for the definitions of logb/ilogb in an upcoming patch.	2025-02-20 15:24:04 +00:00
Fraser Cormack	684ad25dfc	[libclc] Move frexp to CLC library; optimize half vecs (#127836 ) This commit moves the frexp builtin to the CLC library. It simultaneously optimizes the code generated for half vectors, which was previously scalarizing and casting up to float. With this commit it still casts up to float, but keeps it in the vector form.	2025-02-20 08:41:45 +00:00
Fraser Cormack	079115e6ea	[libclc] Move modf to the CLC library (#127828 ) The "generic" unary_(def\|decl)_with_ptr files are intended to be re-used by the sincos and fract builtins in the future as they share an identical type signature.	2025-02-20 08:36:46 +00:00
Fraser Cormack	9743b99cd1	[libclc] Explicitly qualify private address spaces (#127823 ) Doing so provides stability when compiling the builtins in a mode in which unqualified pointers may be interpreted as being in the generic address space, such as in OpenCL 3.0. We eventually want to provide 'generic' overloads of the builtins in libclc so this prepares the ground a little better. It could be argued that having the internal CLC helper functions be unqualified is more flexible, in case it's better for a target to have the pointers in the generic address space. This commits to the private address space for more stability across different OpenCL environments.	2025-02-19 16:26:24 +00:00
Fraser Cormack	fb5a87e1a6	[libclc][NFC] Reformat ep_log.cl	2025-02-19 15:18:05 +00:00
Fraser Cormack	73d067977b	[libclc] Clean up directory search procedure (#127783 ) During a recent change, the build system accidentally dropped the (theoretical) support for the CLC builtins library to build target-specific builtins from the 'amdgpu' directory, due to a change in variable names. This functionality wasn't being used but was spotted during another code review. This commit takes the opportunity to clean up and better document the code that manages the list of directories to search for builtin implementations. While fixing this, some references to now-removed SOURCES files were discovered which have been cleaned up.	2025-02-19 12:21:18 +00:00
Fraser Cormack	1509b46ea5	[libclc] Improve nextafter behaviour around zero (#127469 ) This commit improves the behaviour of (__clc_)nextafter around zero. Specifically, the nextafter value of very small negative numbers in the positive direction is now negative zero. Previously we'd return positive zero. This behaviour is not required as far as OpenCL is concerned: at least, the CTS isn't testing for it. However, this change does bring our implementation into bit-equivalence with (libstdc++'s implementation of) std::nextafter, tested on all possible values of 32-bit float towards both positive and negative INFINITY. Furthermore, since the implementation of libclc's floating-point 'rtp' and 'rtz' conversions use __clc_nextafter, the previous behaviour was resulting in CTS validation issues. For example, when converting float -0x1.000002p-25 to half, rounding towards zero or positive infinity, nextafter was returning +0.0, whereas the correct conversion requires us to return -0.0. We could work around this issue in the conversion functions, but since the change to nextafter is small enough and the behaviour around zero matches libstdc++, the fix feels at home there. This commit also converts several variables to unsigned types to avoid undefined behaviour surrounding signed underflow on the subtractions. It also converts some variables to be kept in floating-point types, using fabs to get the absolute value rather than by bit-hacking.	2025-02-19 10:24:12 +00:00
Fabian Ritter	a2f9ae1421	[AMDGPU] Replace gfx940 and gfx941 with gfx942 in offload and libclc (#125826 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631 and SWDEV-512633	2025-02-19 09:56:04 +01:00
Fraser Cormack	378c6fbe33	[libclc][NFC] Rename macro; undef at end of file	2025-02-18 14:56:25 +00:00
Fraser Cormack	df12bad075	[libclc] Use CLC conversion builtins in CLC functions (#127628 ) This commit is a broad update across libclc to use the CLC conversion builtins in CLC functions, even those with a '__clc' prefix in the generic folder. This better prepares them for an official move to the CLC library in time. The CLC conversion builtins have an additional benefit in that they support scalars, unlike the __builtin_convertvector builtin which we were using previously. This allows us to simplify some shared definitions. There is one change to the IR, in the scalar upsample(char, uchar) builtin. It now sign-extends the first argument to i16, where before it zero-extended it. This appears to be correct, and matches the vector behaviour.	2025-02-18 14:52:41 +00:00
Fraser Cormack	1c6cecdbdd	[libclc] Suppress data-layout warnings during linking (#127532 ) libclc uses llvm-link to link together all of the individually built libclc builtins files into one module. Some of these builtins files are compiled from source by clang whilst others are converted from LLVM IR directly to bytecode. When llvm-link links a 'source' module into a 'destination' module, it warns if the two modules have differing data layouts. The LLVM IR files libclc links either have no data layout (shared submodule files) or an explicit data layout in the case of certain amdgcn/r600 files. The warnings are very noisy and largely inconsequential. We can suppress them exploiting a specific behaviours exhibited by llvm-link. When the destination module has no data layout, it is given the source module's data layout. Thus, if we link together all IR files first, followed by the clang-compiled modules, 99% of the warnings are suppressed as they arose from linking an empty data layout into a non-empty one. The remaining warnings came from the amdgcn and r600 targets. Some of these were because the data layouts were out of date compared with what clang currently produced, so those could have been updated. However, even with those changes and by grouping the IR files together, the linker may still link explicit data layouts with empty ones depending on the order the IR files are processed. As it happens, the data layouts aren't essential. With the changes to the link line we can rely on those IR files receiving the correct data layout from the clang-compiled modules later in the link line. This also makes the previously AMDGPU-specific IR files available to be used by all targets in a generic capacity in the future.	2025-02-18 12:06:14 +00:00
Fraser Cormack	9fec0a0942	[libclc] Disable external-calls testing for clspv targets (#127529 ) These targets don't include all OpenCL builtins, so there will always be external calls in the final bytecode module. Fixes #127316.	2025-02-18 09:14:04 +00:00
Fraser Cormack	15c2d1b328	[libclc] Fix dependencies on generated convert builtins (#127515 ) In #127378 it was reported that builds without clspv targets enabled were failing after #124727, as all targets had a dependency on a file that only clspv targets generated. A quick fix was merged in #127315 which wasn't correct. It moved the dependency on those generated files to the spirv targets, instead of onto the clspv targets. This means a build with spirv targets and without clspv targets would see the same problems as #127378 reported. I tried simply removing the requirement to explicitly add dependencies to the custom command, relying instead on the file-level dependencies. This didn't seem reliable enough; in some cases on a Makefiles build, the clang command compiling (e.g.,) convert.cl would begin before the file was fully written. Instead, we keep the target-level dependency but automatically infer it based on the generated file name, to avoid manual book-keeping of pairs of files and targets. This commit also fixes what looks like an unintended bug where, when ENABLE_RUNTIME_SUBNORMAL was enabled, the OpenCL conversions weren't being compiled.	2025-02-17 17:36:02 +00:00
Michał Górny	dbc98cfa46	[libclc] [cmake] Fix per-target _convert.cl dependencies (#127315 ) Fix `add_libclc_builtin_set` to add an appropriate dependency to either `clspv-generate_convert.cl` or `generate_convert.cl` based on the `ARCH` argument, rather than to both unconditionally. This fixes build failures due to missing dependencies when `clspv` targets are not enabled. The added check mirrors the one from `libclc/CMakeLists.txt`. Fixes: #127378	2025-02-16 08:48:52 +01:00
Fraser Cormack	25c0554166	[libclc] Move conversion builtins to the CLC library (#124727 ) This commit moves the implementations of conversion builtins to the CLC library. It keeps the dichotomy of regular vs. clspv implementations of the conversions. However, for the sake of a consistent interface all CLC conversion routines are built, even the ones that clspv opts out of in the user-facing OpenCL layer. It simultaneously updates the python script to use f-strings for formatting.	2025-02-12 08:55:02 +00:00
Fraser Cormack	64735ad639	[libclc] Move sign to the CLC builtins library (#115699 ) This commit moves the sign builtin's implementation to the CLC library. It simultaneously optimizes it (for vector types) by removing control-flow from the implementation. The __CLC_INTERNAL preprocessor definition has been repurposed (without the leading underscores) to be passed when building the internal CLC library. It was only used in one other place to guard an extra maths preprocessor definition, which we can do unconditionally.	2025-02-11 11:14:49 +00:00
Fraser Cormack	4dec3909e9	[libclc] Have all targets build all CLC functions (#124779 ) This removes all remaining SPIR-V workarounds for CLC functions, in an effort to streamline the CLC implementation and prevent further issues that #124614 had to fix. This commit fixes the same issue for the SPIR-V targets. Target-specific CLC implementations can and will exist, but for now they're all identical and so the target-specific SOURCES files have been removed. Target implementations now always include the 'generic' CLC directory, meaning we can avoid unnecessary duplication of SOURCES listings.	2025-02-10 10:19:22 +00:00
Nikita Popov	26ecddb05d	[libclc] Allow default path when looking for llvm-spirv (#126071 ) This is an external tool, so I don't think there is an expectation that it has to be in the LLVM tools bindir. It may also be in the default system bindir (which is not necessarily the same).	2025-02-07 09:18:18 +01:00
Fraser Cormack	d4144ca27d	[libclc][NFC] Clang-format two files Pre-commit changes to avoid noise in an upcoming PR.	2025-02-06 09:04:27 +00:00
Fraser Cormack	76d1cb22c1	[libclc] Move rotate to CLC library; optimize (#125713 ) This commit moves the rotate builtin to the CLC library. It also optimizes rotate(x, n) to generate the @llvm.fshl(x, x, n) intrinsic, for both scalar and vector types. The previous implementation was too cautious in its handling of the shift amount; the OpenCL rules state that the shift amount is always treated as an unsigned value modulo the bitwidth.	2025-02-05 10:38:23 +00:00
Fraser Cormack	fe694b18dc	[libclc] Move mad_sat to CLC; optimize for vector types (#125517 ) This commit moves the mad_sat builtin to the CLC library. It also optimizes it for vector types by avoiding scalarization. To help do this it transforms the previous control-flow code into vector select code. This has also been done for the scalar versions for simplicity.	2025-02-03 17:50:42 +00:00
Fraser Cormack	0e2abe7de3	[libclc] Remove use of symlinks (#125069 ) Symlinks are problematic on some systems. They aren't strictly necessary as we already have build infrastructure to 'alias' multiple targets' source directories together, as nvptx/nvptx64 has been doing. This commit takes the opportunity to merge together the spirv and spirv64 directories through the same system as they were identical. Fixes #114413	2025-01-30 17:44:07 +00:00
Fraser Cormack	7441e87fe0	[libclc] Move several integer functions to CLC library (#116786 ) This commit moves over the OpenCL clz, hadd, mad24, mad_hi, mul24, mul_hi, popcount, rhadd, and upsample builtins to the CLC library. This commit also optimizes the vector forms of the mul_hi and upsample builtins to consistently remain in vector types, instead of recursively splitting vectors down to the scalar form. The OpenCL mad_hi builtin wasn't previously publicly available from the CLC libraries, as it was hash-defined to mul_hi in the header files. That issue has been fixed, and mad_hi is now exposed. The custom AMD implementation/workaround for popcount has been removed as it was only required for clang < 7. There are still two integer functions which haven't been moved over. The OpenCL mad_sat builtin uses many of the other integer builtins, and would benefit from optimization for vector types. That can take place in a follow-up commit. The rotate builtin could similarly use some more dedicated focus, potentially using clang builtins.	2025-01-29 13:45:33 +00:00
Fraser Cormack	12cdf4330d	[libclc] Move (add\|sub)_sat to CLC; optimize (#124903 ) Using the `__builtin_elementwise_(add\|sub)_sat` functions allows us to directly optimize to the desired intrinsic, and avoid scalarization for vector types.	2025-01-29 11:12:40 +00:00
Fraser Cormack	bb95335982	[libclc][NFC] Clang-format includes	2025-01-28 18:00:25 +00:00
Fraser Cormack	a8c82d5fde	[libclc] Optimize isfpclass-like CLC builtins (#124145 ) The builtins we were using to implement __clc_is(finite\|inf\|nan\|normal) -- __builtin_isfinite, etc. -- don't take vector types so we were previously scalarizing. The __builtin_isfpclass builtin does take vector types and thus allows us to keep things in vectors. There is no change in codegen to the scalar versions of any of these builtins.	2025-01-28 16:23:52 +00:00
Romaric Jodin	9d8d538e40	libclc: clspv: add missing clc_isnan.cl dependency (#124614 ) clc_isnan.cl is needed since https://github.com/llvm/llvm-project/pull/124097	2025-01-28 14:47:08 +00:00
Fraser Cormack	78b5bb702f	[libclc][NFC] Move key math headers to CLC (#124739 )	2025-01-28 14:17:23 +00:00
Fraser Cormack	cfc8ef0ad8	[libclc] Move copysign to CLC library; fix & optimize (#124598 ) This commit moves the implementation of the copysign builtin to the CLC library. It simultaneously optimizes it for vector types by avoiding scalarization. It does so by using the __builtin_elementwise_copysign clang builtins, which can handle vector types. It also fixes a bug in the half/fp16 implementation of the builtin. This version was using an incorrect mask (0x7FFFF instead of 0x7FFF) and was thus preserving the original sign bit, rather than masking it out.	2025-01-28 09:18:34 +00:00
Fraser Cormack	c3a0fcc982	[libclc] Optimize CLC vector any/all builtins (#124568 ) By using the vector reduction buitins we can avoid scalarization. Targets that don't support vector reductions will scalarize later on anyway. The vector reduction builtins should be well-enough supported by the middle-end to be a generic solution. This produces conceptually equivalent code: all vector elements are OR'd/AND'd together and the final scalar is bit-shifted and masked to produce the final result. The 'normalize' builtin uses 'all' so its code has similarly improved in places.	2025-01-27 16:37:21 +00:00
Fraser Cormack	eaa5897534	[libclc] Optimize CLC vector is(un)ordered builtins (#124546 ) These are similar to 347fb208, but these builtins are expressed in terms of other builtins. The LLVM IR generated features the same fcmp ord/uno comparisons as before, but consistently in vector form.	2025-01-27 14:41:40 +00:00
Fraser Cormack	347fb208c1	[libclc] Optimize CLC vector relational builtins (#124537 ) Clang knows how to perform relational operations on OpenCL vectors, so we don't need to use the Clang builtins. The builtins we were using didn't support vector types, so we were previously scalarizing. This commit generates the same LLVM fcmp operations as before, just without the scalarization.	2025-01-27 13:25:37 +00:00
Fraser Cormack	9705500582	[libclc] Move nextafter to the CLC library (#124097 ) There were two implementations of this - one that implemented nextafter in software, and another that called a clang builtin. No in-tree targets called the builtin, so all targets build the software version. The builtin version has been removed, and the software version has been renamed to be the "default". This commit also optimizes nextafter, to avoid scalarization as much as possible. Note however that the (CLC) relational builtins still scalarize; those will be optimized in a separate commit. Since nextafter is used by some convert_type builtins, the diff to IR codegen is not limited to the builtin itself.	2025-01-23 12:24:16 +00:00
Fraser Cormack	9e0b2b68c2	[libclc] Don't rely on fp16 pragma guards in headers (#122751 ) Having the fp16 pragmas enabled in the header file is risky. The macros defined by that header don't (and can't) include the pragmas that make fp16 types themselves legal, and another header may disable the fp16 pragma before the macro's use. The safest thing to do is the use of pragmas surrounding each use of the macro in the implementation files. This pattern is also far more common across the codebase.	2025-01-22 09:32:20 +00:00
Fraser Cormack	eaf3e1b0d1	[libclc] Route int bitselect through CLC; add half (#123653 ) The half variants were missing. The integer bitselect builtins weren't going through __clc_bitselect due to an oversight when the CLC version was introduced.	2025-01-21 10:09:25 +00:00
Fraser Cormack	d96ec48068	[libclc] Route select through __clc_select (#123647 ) This was missed during the introduction of select. This also unifies the various .inc files used for each, as they were essentially identical. The __clc_select function is now also built for SPIR-V targets.	2025-01-21 10:05:39 +00:00
Fraser Cormack	c8eb865747	[libclc] Move mad to the CLC library (#123607 ) All targets build `__clc_mad` -- even SPIR-V targets -- since it compiles to the optimal `llvm.fmuladd` intrinsic. There is no change to the bytecode generated for non-SPIR-V targets. The `mix` builtin, which is implemented as a wrapper around `mad`, is left as an OpenCL-layer wrapper of `__clc_mad`. I don't know if it's worth having a specific CLC version of `mix`. The changes to the other CLC files/functions are moving uses of `mad` to `__clc_mad`, and reformatting. There is an additional instance of `trunc` becoming `__clc_trunc`, which was missed before.	2025-01-20 16:27:51 +00:00
Fraser Cormack	8b7bfb417a	[libclc] Rename include guards. NFC.	2025-01-20 11:26:02 +00:00
Fraser Cormack	a90b5b1885	[libclc] Move degrees/radians to CLC library & optimize (#123222 ) Missing half variants were also added. The builtins are now consistently emitted in vector form (i.e., with a splat of the literal to the appropriate vector size).	2025-01-17 12:11:53 +00:00
Fraser Cormack	b7e20147ad	[libclc] Move smoothstep to CLC and optimize its codegen (#123183 ) This commit moves the implementation of the smoothstep function to the CLC library, whilst optimizing the codegen. This commit also adds support for 'half' versions of smoothstep, which were previously missing. The CLC smoothstep implementation now keeps everything in vectors, rather than recursively splitting vectors by half down to the scalar base form. This should result in more optimal codegen across the board. This commit also removes some non-standard overloads of smoothstep with mixed types, such as 'double smoothstep(float, float, float)'. There aren't any mixed-(element )type versions of smoothstep as far as I can see: gentype smoothstep(gentype edge0, gentype edge1, gentype x) gentypef smoothstep(float edge0, float edge1, gentypef x) gentyped smoothstep(double edge0, double edge1, gentyped x) gentypeh smoothstep(half edge0, half edge1, gentypeh x) The CLC library only defines the first type, for simplicity; the OpenCL layer is responsible for handling the scalar/scalar/vector forms. Note that the scalar/scalar/vector forms now splat the scalars to the vector type, rather than recursively split vectors as before. The macro that used to 'vectorize' smoothstep in this way has been moved out of the shared clcmacro.h header as it was only used for the smoothstep builtin. Note that the CLC clamp function is now built for both SPIR-V targets. This is to help build the CLC smoothstep function for the Mesa SPIR-V target.	2025-01-16 11:44:09 +00:00
Fraser Cormack	a5b88cb815	[libclc] Add missing includes to CLC headers (#118654 ) There's no automatic way of checking these headers are self-contained. Instead of including these common files many times across the whole codebase, we can include them in the generic `gentype.inc` and `floatn.inc` files which are included by most CLC headers.	2025-01-15 10:14:51 +00:00
David Spickett	efd929efa5	[libclc] Add Maintainers.md for libclc (#118309 ) This adds a Maintainers.md files to libclc. Recently I needed to find a libclc maintainer and I had no idea there was one listed in llvm/ instead of in libclc/.	2025-01-06 09:16:26 +00:00
Fraser Cormack	06789ccb16	[libclc] Optimize ceil/fabs/floor/rint/trunc (#119596 ) These functions all map to the corresponding LLVM intrinsics, but the vector intrinsics weren't being generated. The intrinsic mapping from CLC vector function to vector intrinsic was working correctly, but the mapping from OpenCL builtin to CLC function was suboptimally recursively splitting vectors in halves. For example, with this change, `ceil(float16)` calls `llvm.ceil.v16f32` directly once optimizations are applied. Now also, instead of generating LLVM intrinsics through `__asm` we now call clang elementwise builtins for each CLC builtin. This should be a more standard way of achieving the same result The CLC versions of each of these builtins are also now built and enabled for SPIR-V targets. The LLVM -> SPIR-V translator maps the intrinsics to the appropriate OpExtInst, so there should be no difference in semantics, despite the newly introduced indirection from OpenCL builtin through the CLC builtin to the intrinsic. The AMDGPU targets make use of the same `_CLC_DEFINE_UNARY_BUILTIN` macro to override `sqrt`, so those functions also appear more optimal with this change, calling the vector `llvm.sqrt.vXf32` intrinsics directly.	2024-12-13 08:47:13 +00:00
Fraser Cormack	76befc86de	Reland "[libclc] Create aliases with custom_command (#115885 )" (#116025 ) This relands commit 2c980310f67c13dd89c8702d40abeab47a4a2b4b after fixing an issue.	2024-11-13 11:44:21 +00:00

1 2 3 4 5 ...

753 Commits