llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-14 16:16:46 +00:00

Author	SHA1	Message	Date
Fraser Cormack	4cb1803ff9	[libclc][NFC] Fix typo in comment	2025-04-14 14:38:58 +01:00
Wenju He	0c21d6b4c8	[libclc] Fix commands in compile_to_bc are executed sequentially (#130755 ) In libclc, we observe that compiling OpenCL source files to bitcode is executed sequentially on Windows, which increases debug build time by about an hour. add_custom_command may introduce additional implicit dependencies, see https://gitlab.kitware.com/cmake/cmake/-/issues/17097 This PR adds a target for each command, enabling parallel builds of OpenCL source files. CMake 3.27 has fixed above issue with DEPENDS_EXPLICIT_ONLY. When LLVM upgrades cmake vertion to 3.7, we can switch to DEPENDS_EXPLICIT_ONLY.	2025-04-14 14:11:04 +01:00
Wenju He	cbda72a547	[NFC][libclc] Merge atomic extension built-ins with identical name into a single file (#134489 ) llvm-diff shows there is no change to amdgcn--amdhsa.bc. Similar to how cl_khr_fp64 and cl_khr_fp16 implementations are put in a same file for math built-ins, this PR do the same to atom_* built-ins. The main motivation is to prevent that two files with same base name implementats different built-ins. In a follow-up PR, I'd like to relax libclc_configure_lib_source to only compare filename instead of path for overriding, since in our downstream the same category of built-ins, e.g. math, are organized in several different folders.	2025-04-14 10:27:48 +01:00
Fraser Cormack	7d32d72f10	[libclc][NFC] Remove blank line at end of file	2025-04-10 10:02:51 +01:00
Romaric Jodin	135a7874dc	libclc: clspv: fma: remove fp16 implementation (#135002 ) clspv is already handling generation of fp16. This implementation is preventing clspv from making the best choice to use an emulation on top of fp32-fma, or the native fp16-fma, depending on the command-line arguments.	2025-04-10 10:01:57 +01:00
Fraser Cormack	b0338c3d6c	[libclc] Move shuffle/shuffle2 to the CLC library (#135000 ) This commit moves the shuffle and shuffle2 builtins to the CLC library. In so doing it makes the headers simpler and re-usable for other builtin layers to hook into the CLC functions, if they wish. An additional gentype utility has been made available, which provides a consistent vector-size-or-1 macro for use. The existing __CLC_VECSIZE is defined but empty which is useful in certain applications, such as in concatenation with a type to make a correctly sized scalar or vector type. However, this isn't usable in the same preprocessor lines when wanting to check for specific vector sizes, as e.g., '__CLC_VECSIZE == 2' resolves to '== 2' which is invalid. In local testing this is also useful for the geometric builtins which are only available for scalar types and vector types of 2, 3, or 4 elements. No codegen changes are observed, except the internal shuffle/shuffle2 utility functions are no longer made publicly available.	2025-04-09 15:52:25 +01:00
Fraser Cormack	949bf518fc	[libclc][NFC] Fix up inconsistent copyright headers Some files were accidentally given two copyright headers. Another was missing one. This commit also converts that file's dos line endings to unix ones and reformats a comment.	2025-04-09 12:00:08 +01:00
Romaric Jodin	0e98817458	libclc: frexp: fix implementation regarding denormals (#134823 ) Devices not supporting denormals can compare them true against zero. It leads to result not matching the CTS expectation when either supporting or not denormals. For example for 0x1.008p-140 we get {0x1.008p-140, 0} while the CTS expects {0x1.008p-1, -139} when supporting denormals, or {0, 0} when not supporting denormals (flushed to zero). Ref #129871	2025-04-08 14:50:26 +01:00
Romaric Jodin	7baa7edc00	[libclc]: clspv: add a dummy implememtation for mul_hi (#134094 ) clspv uses a better implementation that is not using a bigger side when not available. Add a dummy implementation for mul_hi to avoid to override the implementation of clspv with the one in libclc.	2025-04-03 10:18:39 +01:00
Fraser Cormack	ddc48fefe3	[libclc] Move native_(exp10\|powr\|tan) to CLC library (#134080 ) These are the three remaining native builtins not yet ported. There are elementwise versions of exp10 and tan which correspond to the intrinsics, which may be preferable to the current versions which route through other native builtins. Those could be changed in a follow-up if desired.	2025-04-02 17:37:17 +01:00
Fraser Cormack	f186041553	[libclc] Move sinh, cosh & tanh to the CLC library (#134063 ) This commit also vectorizes the builtins.	2025-04-02 15:22:42 +01:00
Fraser Cormack	d51525ba36	[libclc] Move lgamma, lgamma_r & tgamma to CLC library (#134053 ) Also enable half-precision variants of tgamma, which were previously missing. Note that unlike recent work, these builtins are not vectorized as part of this commit. Ultimately all three call into lgamma_r, which has heavy control flow (including switch statements) that would be difficult to vectorize. Additionally the lgamma_r algorithm is copyrighted to SunPro so may need a rewrite in the future anyway. There are no codegen changes (to non-SPIR-V targets) with this commit, aside from the new half builtins.	2025-04-02 15:20:32 +01:00
Fraser Cormack	dd19e7eaaa	[libclc] Move cbrt to the CLC library; vectorize (#133940 )	2025-04-02 10:18:24 +01:00
Fraser Cormack	f14ff59da7	[libclc] Move exp, exp2 and expm1 to the CLC library (#133932 ) These all share the use of a common helper function so are handled in one go. These builtins are also now vectorized.	2025-04-01 18:15:37 +01:00
Fraser Cormack	00e6d4fe06	[libclc][NFC] Delete three unused .inc files	2025-04-01 17:36:01 +01:00
Fraser Cormack	c1efd8b663	[libclc][NFC] Delete two unused headers These should have been deleted when the respective builtins were moved to the CLC library.	2025-04-01 14:54:50 +01:00
Fraser Cormack	bcf0f8d8aa	[libclc] Move exp10 to the CLC library (#133899 ) The builtin was already nominally in the CLC library; this commit just moves it over. It also vectorizes the builtin on its way.	2025-04-01 14:39:17 +01:00
Fraser Cormack	13a313fe58	[libclc] Move sinpi/cospi/tanpi to the CLC library (#133889 ) Additionally, these builtins are now vectorized. This also moves the native_recip and native_divide builtins as they are used by the tanpi builtin.	2025-04-01 12:03:21 +01:00
Fraser Cormack	ad48fffb53	[libclc] Move several 'native' builtins to CLC library (#129679 ) This commit moves the 'native' builtins that use asm statements to generate LLVM intrinsics to the CLC library. In doing so it converts them to use the appropriate elementwise builtin to generate the same intrinsic; there are no codegen changes to any target except to AMDGPU targets where `native_log` is no longer custom implemented and instead used the clang elementwise builtin. This work forms part of #127196 and indeed with this commit there are no 'generic' builtins using/abusing asm statements - the remaining builtins are specific to the amdgpu and r600 targets.	2025-04-01 09:20:54 +01:00
Fraser Cormack	7a2b160e76	[libclc] Move rootn to the CLC library; optimize (#133735 ) The function was already nominally in the CLC namespace; this commit just moves it over. This commit also vectorizes the builtin to avoid scalarization.	2025-04-01 09:19:50 +01:00
Fraser Cormack	87602f6d03	[libclc] Fix unresolved reference to missing table (#133691 ) Splitting the 'ln_tbl' into two in db98e292 wasn't done thoroughly enough as some references to the old table still remained. This commit fixes the unresolved references by updating to the new split table.	2025-03-31 16:55:23 +01:00
Fraser Cormack	3fd0eaae52	[libclc][amdgpu] Implement native_exp2 via AMD builtin (#133696 ) This came up during a discussion on #129679, which has been split out as a preparatory commit. An example of the AMDGPU codegen is: define <2 x float> @_Z10native_expDv2_f(<2 x float> %val) { %mul = fmul afn <2 x float> %val, splat (float 0x3FF7154760000000) %0 = extractelement <2 x float> %mul, i64 0 %1 = tail call float @llvm.amdgcn.exp2.f32(float %0) %vecinit.i = insertelement <2 x float> poison, float %1, i64 0 %2 = extractelement <2 x float> %mul, i64 1 %3 = tail call float @llvm.amdgcn.exp2.f32(float %2) %vecinit2.i = insertelement <2 x float> %vecinit.i, float %3, i64 1 ret <2 x float> %vecinit2.i } define <2 x float> @_Z11native_exp2Dv2_f(<2 x float> %x) { %0 = extractelement <2 x float> %x, i64 0 %1 = tail call float @llvm.amdgcn.exp2.f32(float %0) %vecinit = insertelement <2 x float> poison, float %1, i64 0 %2 = extractelement <2 x float> %x, i64 1 %3 = tail call float @llvm.amdgcn.exp2.f32(float %2) %vecinit2 = insertelement <2 x float> %vecinit, float %3, i64 1 ret <2 x float> %vecinit2 }	2025-03-31 16:54:04 +01:00
Fraser Cormack	b52977b868	[libclc] Move pow, powr & pown to the CLC library (#133294 ) These functions were already nominally in the CLC library. Similar to others, these builtins are now vectorized and are not broken down into scalar types.	2025-03-28 08:23:24 +00:00
Fraser Cormack	0a74cbfac4	[libclc] Pass -fapprox-func when compiling 'native' builtins (#133119 ) The libclc build system isn't well set up to pass arbitrary options to arbitrary source files in a non-intrusive way. There isn't currently any other motivating example to warrant rewriting the build system just to satisfy this requirement. So this commit uses a filename-based approach to inserting this option into the list of compile flags.	2025-03-28 08:22:19 +00:00
Fraser Cormack	d32e71d7c7	[libclc] Move fmod, remainder & remquo to the CLC library (#132054 ) These functions were already nominally in the CLC namespace; this commit just formally moves them over. Note that 'half' versions of these CLC functions are now provided. Previously the corresponding OpenCL builtins would forward directly to the 'float' versions of the CLC builtins. Now the OpenCL builtins call the 'half' CLC builtins, which themselves call the 'float' CLC versions. This keeps the interface between the OpenCL and CLC libraries neater and keeps the CLC library self-contained. No changes to the generated code for non-SPIR-V targets is observed.	2025-03-27 14:53:19 +00:00
Fraser Cormack	3284559cca	[libclc] Move atan2/atan2pi to the CLC library (#133226 ) As with other work in this area, these builtins are now vectorized. A further table has been split into two. There was discrepancy between comments above the table describing the values as "lead" and "tail" and variables taken from the table called "head" and "tail", so these have been unified as head/tail.	2025-03-27 10:59:09 +00:00
Fraser Cormack	db98e2922f	[libclc] Move log1p/asinh/acosh/atanh to the CLC library (#132956 ) These four functions all related in that they share tables and helper functions. Furthermore, the acosh and atanh builtins call log1p. As with other work in this area, these builtins are now vectorized. To enable this, there are new table accessor functions which return a vector of table values using a vector of indices. These are internally scalarized, in the absence of gather operations. Some tables which were tables of multiple entries (e.g., double2) are split into two separate "low" and "high" tables. This might affect the performance of memory operations but are hopefully mitigated by better codegen overall.	2025-03-27 09:19:07 +00:00
Fraser Cormack	3013458a79	[libclc] Move asinpi/acospi/atanpi to the CLC library (#132918 ) Similar to d46a6999, this commit simultaneously moves these three functions to the CLC library and optimizes them for vector types by avoiding scalarization.	2025-03-25 13:31:53 +00:00
Fraser Cormack	d46a699953	[libclc] Move asin/acos/atan to the CLC library (#132788 ) This commit simultaneously moves these three functions to the CLC library and optimizing them for vector types by avoiding scalarization.	2025-03-25 09:11:32 +00:00
Fraser Cormack	7e22b09031	[libclc] Add missing license headers to source IR files (#132758 )	2025-03-24 16:21:59 +00:00
Fraser Cormack	70c325bf6a	[libclc] Move fp32 sincos helpers to CLC library (#132753 ) This commit moves most of the sincos helper functions to the CLC library. It simultaneously vectorizes them with the aim to increase performance for vector types by avoiding scalarization. Some helpers for double types remain as they use various features not yet ready, like 'fract' which in turn relies on 'fmin'; neither of these are in the CLC library. They also use table lookups and type punning which don't translate well to vector versions. As a proof of concept, float and half versions of the sin and cos builtins are now vectorized and use the CLC helpers to do so. They remain in the OpenCL layer but will be simpler to move to the CLC library when the double versions are ready.	2025-03-24 16:09:31 +00:00
Romaric Jodin	a6a56a326a	[libclc] erfc: fix fp32 implementation in FTZ mode (#132390 ) On some implementations, the current implementation leads to slight accuracy issues. While the maths behind this implementation is correct, it does not take into account the accumulation of errors coming from other operators that do not provide correct rounding (like the exp function). To avoid it, compute statically exp(-0.5625). Fixes #124939	2025-03-24 16:08:54 +00:00
Fraser Cormack	63b5692bac	[libclc] Relicense gen_convert.py (#132213 ) Similar to work done in 82912fd6, this commit re-licenses both the gen_convert.py script and the file it generates. It previously possessed an MIT license, with three additional individual copyrights. The file it generated was similar, but to only two of the three individuals. LLVM's policy is not to accept contributions that include in-source copyright notices [1]. I'm not aware whether the individuals concerned signed the re-licensing agreement or not. It takes the opportunity to update the description(s) in the header files, since the previous comments were out of date. [1] https://llvm.org/docs/DeveloperPolicy.html#embedded-copyright-or-contributed-by-statements	2025-03-24 11:10:07 +00:00
Fraser Cormack	7d048674a4	[libclc] Add license headers to files missing them (#132239 ) This commit bulk updates all '.h', '.cl', '.inc', and '.cpp' files to add any missing license headers. The remaining files are generally CMake, SOURCES, scripts, markdown, etc. There are still some '.ll' files which may benefit from a license header. I can't find an example of an LLVM IR file with a license header in the rest of LLVM, but unlike most other (sub)projects, libclc has examples of LLVM IR as source files, compiled and built into the library.	2025-03-24 10:10:38 +00:00
Wenju He	735d7c1539	[libclc] link_bc target should depends on target builtins.link.clc-arch_suffix (#132338 ) Currently link_bc command depends on the bitcode file that is associated with custom target builtins.link.clc-arch_suffix. On windows we randomly see following error: ` Generating builtins.link.clc-${ARCH}--.bc Generating builtins.link.libspirv-${ARCH}.bc error : The requested operation cannot be performed on a file with a user-mapped section open. ` I suspect that builtins.link.clc-${ARCH}--.bc file is being generated while it is being used in link_bc. This PR adds target-level dependency to ensure builtins.link.clc-${ARCH}--.bc is generated first.	2025-03-24 10:09:19 +00:00
Wenju He	cb1e91c18d	[libclc] add --only-needed to llvm-link when INTERNALIZE flag is set (#130871 ) When -internalize flag is passed to llvm-link, we only need to link in needed symbols. This PR reduces size of linked bitcode, e.g. by removing following symbols: _Z12__clc_sw_fmaDv16_fS_S_ _Z12__clc_sw_fmaDv2_fS_S_ _Z12__clc_sw_fmaDv3_fS_S_ _Z12__clc_sw_fmaDv4_fS_S_ _Z12__clc_sw_fmaDv8_fS_S_ _Z12__clc_sw_fmafff	2025-03-20 13:25:55 +00:00
Fraser Cormack	82912fd620	[libclc] Update license headers (#132070 ) This commit bulk-updates the libclc license headers to the current Apache-2.0 WITH LLVM-exception license in situations where they were previously attributed to AMD - and occasionally under an additional single individual contributor - under an MIT license. AMD signed the LLVM relicensing agreement and so agreed for their past contributions under the new LLVM license. The LLVM project also has had a long-standing, unwritten, policy of not adding copyright notices to source code. This policy was recently written up [1]. This commit therefore also removes these copyright notices at the same time. Note that there are outstanding copyright notices attributed to others - and many files missing copyright headers - which will be dealt with in future work. [1] https://llvm.org/docs/DeveloperPolicy.html#embedded-copyright-or-contributed-by-statements	2025-03-20 11:40:09 +00:00
Matt Arsenault	846cf86b2b	libclc: Add missing gfx950 target (#131585 )	2025-03-17 17:47:18 +07:00
Fraser Cormack	a2b0576172	[libclc] Stop installing CLC headers (#126908 ) The libclc headers are an implementation detail and are not intended to be used by others as OpenCL headers. The only artifacts of libclc we want to publish are the LLVM bytecode libraries. As the headers have been incidentally broken by recent changes, this commit takes the step to stop installing the headers at all. Downstreams can use clang's own OpenCL headers, and/or its -fdeclare-opencl-builtins flag. Fixes #119967.	2025-03-06 08:52:23 +00:00
Fraser Cormack	760eeac6a2	[libclc] Reduce bithacking in CLC frexp (#129871 ) Also replace some magic constants with named ones. Checking against FP zero and using isnan and isinf functions allows the optimizer to create one unified @llvm.is.fpclass intrinsic. This results in fewer more canonical IR instructions.	2025-03-05 14:18:51 +00:00
Fraser Cormack	e5d5503e4e	[libclc] Move hypot to CLC library; optimize (#129551 ) This was already nominally in the CLC library; this commit just formally moves it over. It simultaneously optimizes it for vector types by avoiding scalarization.	2025-03-04 14:16:16 +00:00
Fraser Cormack	1357279df9	[libclc] Move rsqrt to the CLC library (#129045 ) This also adds missing half variants to certain targets. It also optimizes some targets' implementations to perform the operation directly in vector types, as opposed to scalarizing.	2025-02-27 15:46:58 +00:00
Fraser Cormack	285b411e46	[libclc] Move sqrt to CLC library (#128748 ) This is fairly straightforward for most targets. We use the element-wise sqrt builtin by default. We also remove a legacy pre-filtering of the input argument, which the intrinsic now officially handles. AMDGPU provides its own implementation of sqrt for double types. This commit moves this into the implementation of CLC sqrt. It uses weak linkage on the 'default' CLC sqrt to allow AMDGPU to only override the builtin for the types it cares about.	2025-02-27 12:30:24 +00:00
Fraser Cormack	5f4d1f7400	[libclc] Make CLC library warning-free (#128864 ) There is a long-standing workaround in the libclc build system that silences a warning about the use of parentheses in bitwise conditional operations. In an effort to remove this workaround, this commit re-enables the warning on the internal CLC library, where most of the bodies of the builtins will eventually be defined. Thus as we move builtin implementations into this library, the warnings will trigger and we can clean up the codebase as we go. As it happens the only instance in the CLC library which triggered the warning was in __clc_ldexp.	2025-02-26 12:11:26 +00:00
Fraser Cormack	d5038b3774	[libclc] Move __clc_ldexp to CLC library (#126078 ) This function was already conceptually in the CLC namespace - this just formally moves it over. Note however that this commit marks a change in how libclc functions may be overridden by targets. Until now we have been using a purely build-system-based approach where targets could register identically-named files which took responsibility for the implementation of the builtin in its entirety. This system wasn't well equipped to deal with AMD's overriding of __clc_ldexp for only a subset of types, and furthermore conditionally on a pre-defined macro. One option for handling this would be to require AMD to duplicate code for the versions of __clc_ldexp it's not interested in overriding. We could also make it easier for targets to re-define CLC functions through macros or .inc files. Both of these have obvious downsides. We could also keep AMD's overriding in the OpenCL layer and bypass CLC altogether, but this has limited use. We could use weak linkage on the "base" implementations of CLC functions, and allow targets to opt-in to providing their own implementations on a much finer granularity. This commit supports this as a proof of concept; we could expand it to all CLC builtins if accepted. Note that the existing filename-based "claiming" approach is still in effect, so targets have to name their overrides differently to have both files compiled. This could also be refined.	2025-02-26 11:20:25 +00:00
Fraser Cormack	a821ae2847	[libclc] Move round to CLC library (#128721 )	2025-02-25 16:24:57 +00:00
Fraser Cormack	1e0e4169dd	[libclc][NFC] Remove unused intrinsics helpers (#128708 ) We want to move away from using asm declarations to define builtins.	2025-02-25 14:29:35 +00:00
Fraser Cormack	f8948d3c47	[libclc] Move log/log2/log10 to CLC library (#128540 ) This commit also enables fp16 log, which was previously missing. Other than that, no changes to codegen for AMDGPU/Nvidia targets. Note that for simplicity this commit doesn't try to refactor or optimize the implementations. Notably, each log is only implementated for scalar types; vector types are scalarized. It doesn't look too difficult to make the implementations suitable for vector codegen, so I'll try that in a future commit. There's also an unused implementation of log in clc_log_base.h, whereas the implementation currently used by libclc targets re-uses log2 with an additional multiplication. That should also be cleaned up as on first inspection it looks a more optimal implementation, though it would have to be checked against the OpenCL CTS for good measure.	2025-02-25 11:44:59 +00:00
Matt Arsenault	b57e63b07a	libclc: Stop using asm declarations for r600 on amdgcn for get_global_size (#128692 ) Comparing the case where each dimension is used alone, the only codegen difference is a missed addressing mode fold for the constant offset in the old version due to an ancient bug.	2025-02-25 18:23:04 +07:00
Fraser Cormack	2dfb29a9b2	[libclc] Move nan to the CLC library (#128521 )	2025-02-24 15:41:31 +00:00

1 2 3 4 5 ...

800 Commits