This reverts commit 6fa1bfad65edefe3f4c17740f05297d34e833b47.
Revert it due to breaking buildbot:
Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\tools\amdgpu-arch\AMDGPUArchByHIP.cpp(104):
error C2039: 'parse': is not a member of 'llvm::VersionTuple'
https://lab.llvm.org/buildbot/#/builders/46/builds/13184
This change moves GlobalUniqueCallSite into NVPTXISelLowering. In
processes where multiple compilations occur, this makes call site
enumeration local to individual compilation, which ensures that call
site numbers are consistently sequential within each compilation and is
independent of other compilations happening in parallel.
Inspired by PR #127944, this patch adds an option to print profile metadata inline with respect to the instruction (or function) it annotates - this saves one time from having to search up and down large textual modules to find this info.
This paper made it a constraint violation for the same identifier
within a TU to have both internal and external linkage. It was
previously UB.
Clang does not correctly diagnose the constraint in some cases,
documented in the added test case.
This paper removes UB around use of void expressions. Previously, code
like this had undefined behavior:
```
void foo(void) {
(void)(void)1;
extern void x;
x;
}
```
and this is now well-defined in C2y. Functionally, this now means that
it is valid to use `void` as a `_Generic` association.
This patch is to explicitly link the pthread library when building
shared flang-rt.
On AIX, for example, it needs to link in `libpthread.a` explicitly in
order to resolve the references to those `pthread_*` functions in
`include/flang-rt/runtime/lock.h`
After #130165.
In the code: `VRMap[CurStageNum][Def] = VRMap[CurStageNum][LoopVal]`
`VRMap[CurStageNum][LoopVal]` calculates a reference before
`VRMap[CurStageNum][Def]` which may rehash the DenseMap.
Then the reference can be dead.
Instead of returning the number of bytes emitted, just take the iterator
by reference so the increments in emitULEB128 will update the copy in
the caller.
Also pass the iterator by reference to emitNumToSkip so we don't need a
separate I += 3 in the caller.
Removes the condition that checks that operand is not indexed by
reduction iterators which allows for more fine-grained control via the
reshape fusion control function. For example, users could allow fusing
reshapes expand the M/N dims of a matmul but not the K dims (or preserve
the current behavior by not fusing at all).
---------
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Add Rocdl support for the following GFX950 instructions:
CVT_SCALE_PK_FP8_F16
CVT_SCALE_PK_BF8_F16
CVT_SCALE_PK_FP8_BF16
CVT_SCALE_PK_BF8_BF16
CVT_SCALE_SR_FP8_F16
CVT_SCALE_SR_BF8_F16
CVT_SCALE_SR_FP8_BF16
CVT_SCALE_SR_BF8_BF16
CVT_SCALE_PK_F16_FP8
CVT_SCALE_PK_F16_BF8
CVT_SCALE_F16_FP8
CVT_SCALE_F16_BF8
`inst->getFnAttr(Kind)` fallbacks to check if the parent has an
attribute, which breaks roundtriping the LLVM IR. This change actually
checks only in the call attribute list (no fallback to parent queries).
It's possible to argue that this small optimization isn't harmful, but
seems too early if it's breaking roundtrip behavior.
Add FP8 lit tests to the following operators:
ARGMAX
AVGPOOL
CONV2D
CONV3D
DEPTHWISE_CONV2D
MATMUL
MAX_POOL2D
TRANSPOSE_CONV2D
CONST
CAST
CONCAT
PAD
RESHAPE
REVERSE
SLICE
TILE
TRANSPOSE
GATHER
SCATTER
Signed-off-by: Tai Ly <tai.ly@arm.com>
Signed-off-by: Jerry Ge <jerry.ge@arm.com>
Co-authored-by: Tai Ly <tai.ly@arm.com>
Fixes#130159
The problem is that the alloca created for the private variable uses the
default alloca address space in that module, but the function the
pointer is being passed to expects a different address space, leading to
a type missmatch in the function argument.
I know nothing about how AMDGPU is supposed to work. I based this
solution on code from createDeviceArgumentAccessor(). Please could
somebody from AMD confirm this solution is appropriate.
Local variable initialization was previously being ignored. This change
adds support for initialization of scalar variables with constant values
and introduces the constant emitter framework.
Recently HIP runtime changed dll name to amdhip64_n.dll on Windows,
where n is ROCm major version number.
Fix amdgpu-arch to search for amdhip64_n.dll on Windows.
…Stream.
CachedFileStream has previously performed the commit step in its
destructor, but this means its only recourse for error handling is
report_fatal_error. Modify this to add an explicit commit() method, and
call this in the appropriate places with appropriate error handling for
the location.
Currently the destructor of CacheStream gives an assert failure in Debug
builds if commit() was not called. This will help track down any
remaining uses of the API that assume the old destructior behaviour. In
Release builds we fall back to the previous behaviour and call
report_fatal_error if the commit fails.
PR #130124 added a use of FlagInserter to the start of
SelectionDAGLegalize::PromoteNode, making some of the places where we
set flags be redundant, so remove them. The places where the setting of
flags remains are in non-floating-point operations.
#129363 handled all the cases where there was a sext for the original source value, but not for cases where the source is already half the size of the destination type
Another regression noticed in #76524
This PR will solve the issue with leaking shared memory we have after running llvm lit test on z/OS.
In particular llvm/unittests/ExecutionEngine/Orc/SharedMemoryMapperTest.cpp was causing the leak.
Summary:
Thanks to @Meinersbur for finding this. The `:` character used in AMD's
target-id's is invalid on windows. This patch replaces them with `-`.
---------
Co-authored-by: Michael Kruse <github@meinersbur.de>
There were two issues when `/guard:ehcont` is enabled with the SEH
personality on Windows:
1. As @namazso correctly identified, we bail out of
`WinException::endFunction` early for `MSVC_TableSEH` with funclets,
expecting the exception data to be emitted in `endFunclet`, but
`endFunclet` didn't copy the EHCont metadata from the function to the
module.
2. The SEH personality requires that the basic block containing the
`catchpad` is the target, not the `catchret`.
Fixes#64585
In the previous code (#128032), it specified the destination vector as the
getShuffleCost argument. Because the shuffle mask specifies the indices
of the two vectors specified as elements, the maximum value is twice the
size of the source vector. This causes a problem if the destination
vector is smaller than the source vector and specify an index in the
mask that exceeds the size of the destination vector.
Fix the problem by correcting the previous code, which was using wrong
argument in the Cost calculation.
Fixes#130250
This happens a lot with `if constexpr` with a condition based on a
template param. In those cases, the condition is a ConstantExpr with a
value already set, so we can use that and ignore the other branch.
I recently realised that we return an invalid cost when requesting
the type-based cost for the get.active.lane.mask intrinsic. I've
fixed that in this patch by reusing the existing code for the
non-type-based model.
Summary:
This attribute is mostly borrowed from OpenCL, but is useful in general
for accessing the LLVM vector types. Previously the only way to use it
was through typedefs. This patch changes that to allow use as a regular
type attribute, similar to address spaces.
I initially implemented codegen to be a 'no-op' for these declarations,
which I thought was properly implemented. However, when they are a
top-level decl, we have a separate switch. This patch makes sure they
are properly emitted at top-level as a no-op, and adds a test for both
top-level and not top-level.