This commit moves the 'native' builtins that use asm statements to
generate LLVM intrinsics to the CLC library. In doing so it converts
them to use the appropriate elementwise builtin to generate the same
intrinsic; there are no codegen changes to any target except to AMDGPU
targets where `native_log` is no longer custom implemented and instead
used the clang elementwise builtin.
This work forms part of #127196 and indeed with this commit there are no
'generic' builtins using/abusing asm statements - the remaining builtins
are specific to the amdgpu and r600 targets.
This commit bulk updates all '.h', '.cl', '.inc', and '.cpp' files to
add any missing license headers.
The remaining files are generally CMake, SOURCES, scripts, markdown,
etc.
There are still some '.ll' files which may benefit from a license
header. I can't find an example of an LLVM IR file with a license header
in the rest of LLVM, but unlike most other (sub)projects, libclc has
examples of LLVM IR as source files, compiled and built into the
library.
This also adds missing half variants to certain targets.
It also optimizes some targets' implementations to perform the operation
directly in vector types, as opposed to scalarizing.
libclc uses llvm-link to link together all of the individually built
libclc builtins files into one module. Some of these builtins files are
compiled from source by clang whilst others are converted from LLVM IR
directly to bytecode.
When llvm-link links a 'source' module into a 'destination' module, it
warns if the two modules have differing data layouts.
The LLVM IR files libclc links either have no data layout (shared
submodule files) or an explicit data layout in the case of certain
amdgcn/r600 files.
The warnings are very noisy and largely inconsequential. We can suppress
them exploiting a specific behaviours exhibited by llvm-link. When the
destination module has no data layout, it is given the source module's
data layout. Thus, if we link together all IR files first, followed by
the clang-compiled modules, 99% of the warnings are suppressed as they
arose from linking an empty data layout into a non-empty one.
The remaining warnings came from the amdgcn and r600 targets. Some of
these were because the data layouts were out of date compared with what
clang currently produced, so those could have been updated.
However, even with those changes and by grouping the IR files together,
the linker may still link explicit data layouts with empty ones
depending on the order the IR files are processed.
As it happens, the data layouts aren't essential. With the changes to
the link line we can rely on those IR files receiving the correct data
layout from the clang-compiled modules later in the link line. This also
makes the previously AMDGPU-specific IR files available to be used by
all targets in a generic capacity in the future.
The SPIR spec states that all OpenCL built-in functions should be
overloadable and mangled, to ensure consistency.
Add the overload attribute to functions which were missing them:
work dimensions, memory barriers and fences, and events.
Reviewed By: tstellar, jenatali
Differential Revision: https://reviews.llvm.org/D82078
Same reason as amdgcn.
Fixes fmin, minmag CTS on turks.
Reviewer: Tom Stellard <tstellar@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 334228
Same reason as amdgcn.
Fixes fmax, maxmag CTS on turks.
Reviewer: Tom Stellard <tstellar@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 334227
The implementation uses r600 sepcific intrinsics
LLVM-4 switched to _ro_t and _rw_t image types
Portions of the code can be moved back as more targets/llvm versions add image support
Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315341
We don't have memory fences for r600 so just call group barrier directly
Make sure that barrier is called even with 0 flags
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 312492
Also fix get_global_id to consider offset
No idea how to add this for ptx, so they are stuck with the old get_global_id
implementation.
v2: split to a separate patch
v3: Switch R600 to use implictarg.ptr
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276443
v2: split into 2 patches
use clang builtins for other intrinsics as well
v3: Fix warnings
Switch r600 to use implictarg.ptr
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276442
Most files remain in a common amdgpu directory.
Also switches barriers to to use convergent,
and use llvm.amdgcn.s.barrier.
This now requires 3.9/trunk to build amdgcn.
llvm-svn: 260777
v2: Fix function declaration
Add range metadata to r600 implementation
v3: change prefix to AMDGPU
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219793
This generates bitcode which is indistinguishable from what was
hand-written for int32 types in v[load|store]_impl.ll.
v4: Use vec2+scalar for vec3 load/stores to prevent corruption (per Tom)
v3: Also remove unused generic/lib/shared/v[load|store]_impl.ll
v2: (Per Matt Arsenault) Fix alignment issues with vector load stores
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 216069