This fixes partial ordering of pack expansions of NTTPs, by procedding
with the check using the pattern of the NTTP through the rules of the
non-pack case.
This also unifies almost all of the different versions of
FinishTemplateArgumentDeduction (except the function template case).
This makes sure they all follow the rules consistently, instantiating
the parameters and comparing those with the argument.
Fixes#132562
When a host associated `threadprivate` variable was used in a parallel
region with `default(none)` in an internal subroutine was failing,
because the compiler did not properly determine that the variable was
pre-determined `threadprivate` and thus should not have been reported as
missing a DSA.
The current implementation iterates and modifies the list of arguments
at the same time. Depending on the number of arguments this will trigger
an assert: `assert(index < arguments.size())`. This change replaces loop
with a range based erasure.
fixes https://github.com/llvm/llvm-project/issues/133199
As of the third commit the fix to the linker missing references in
`Targets/DirectX.cpp` found in
https://github.com/llvm/llvm-project/pull/133776 was fixed by moving
`HLSLBufferLayoutBuilder.cpp` to `clang/lib/CodeGen/Targets/`.
It fixes the circular reference issue found in
https://github.com/llvm/llvm-project/pull/133619 for all
`-DBUILD_SHARED_LIBS=ON` builds by removing `target_link_libraries` from
the sub directory cmake files.
testing for amdgpu offload was done via
`cmake -B ../llvm_amdgpu -S llvm -GNinja -C
offload/cmake/caches/Offload.cmake -DCMAKE_BUILD_TYPE=Release`
PR https://github.com/llvm/llvm-project/pull/132252 Created a second
file that shared <TargetName>.cpp in clang/lib/CodeGen/CMakeLists.txt
For example There were two AMDGPU.cpp's one in TargetBuiltins and the
other in Targets. Even though these were in different directories
libtool warns that it might not distinguish them because they share the
same base name.
There are two potential fixes. The easy fix is to rename one of them and
keep one cmake file. That solution though doesn't future proof this
problem in the event of a third <TargetName>.cpp and it seems teams want
to just use the target name
https://github.com/llvm/llvm-project/pull/132252#issuecomment-2758178483.
The alternative fix that this PR went with is to seperate the cmake
files into their own sub directories as static libs.
Part one of merging #132486. Add support for representing volatility in
the type system for reference, box, and class types. Don't do anything
with volatile just yet, only support and test their representation and
utility functions.
The naming convention is a little goofy - `fir::isa_volatile_type` and
`fir::updateTypeWithVolatility` use different capitalization, but I put
them near similar functions and tried to match the surrounding
conventions and [the
docs](https://github.com/llvm/llvm-project/blob/main/flang/docs/C%2B%2Bstyle.md#naming)
best I could.
The WebAssemblyLowerRefTypesIntPtrConv pass currently uses `undef` to
represent trap instructions. These can instead be represented by the
`poison` value.
- `selectCross` looks to be a function that had its implementation and
usage removed but this define some how stuck around.
- this change removes the definition.
Recommit. This work was done by #132246 but failed buildbots due to the
test introduced needing updates
Inefficient SVE codegen occurs on at least two in-order cores, those
being Cortex-A510 and Cortex-A520. For example a simple vector add
```
void foo(float a, float b, float dst, unsigned n) {
for (unsigned i = 0; i < n; ++i)
dst[i] = a[i] + b[i];
}
```
Vectorizes the inner loop into the following interleaved sequence of
instructions.
```
add x12, x1, x10
ld1b { z0.b }, p0/z, [x1, x10]
add x13, x2, x10
ld1b { z1.b }, p0/z, [x2, x10]
ldr z2, [x12, #1, mul vl]
ldr z3, [x13, #1, mul vl]
dech x11
add x12, x0, x10
fadd z0.s, z1.s, z0.s
fadd z1.s, z3.s, z2.s
st1b { z0.b }, p0, [x0, x10]
addvl x10, x10, #2
str z1, [x12, #1, mul vl]
```
By adjusting the target features to prefer fixed over scalable if the
cost is equal we get the following vectorized loop.
```
ldp q0, q3, [x11, #-16]
subs x13, x13, #8
ldp q1, q2, [x10, #-16]
add x10, x10, #32
add x11, x11, #32
fadd v0.4s, v1.4s, v0.4s
fadd v1.4s, v2.4s, v3.4s
stp q0, q1, [x12, #-16]
add x12, x12, #32
```
Which is more efficient.
This change standardises the naming convention for the argument
representing the value to store in various vector operations.
Specifically, it ensures that all vector ops storing a value—whether
into memory, a tensor, or another vector — use `valueToStore` for the
corresponding argument name.
Updated operations:
* `vector.transfer_write`, `vector.insert`, `vector.scalable_insert`,
`vector.insert_strided_slice`.
For reference, here are operations that currently use `valueToStore`:
* `vector.store` `vector.scatter`, `vector.compressstore`,
`vector.maskedstore`.
This change is non-functional (NFC) and does not affect the
functionality of these operations.
Implements #131602
Check the memory space before lowering allocation ops, instead of
starting the lowering and then rolling back the pattern when the memory
space was found to be incompatible with LLVM.
Note: This is in preparation of the One-Shot Dialect Conversion
refactoring.
Note: `isConvertibleAndHasIdentityMaps` now also checks the memory
space.
Should keep MSVC quiet as noticed by @rksimon in #134517.
Assertions have been copied over from PointerType::get in order to not
silently change invariants with this call.
Original code sequence:
* pcalau12i $a0, %ie_pc_hi20(sym)
* ld.d $a0, $a0, %ie_pc_lo12(sym)
The code sequence converted is as follows:
* lu12i.w $a0, %le_hi20(sym) # le_hi20 != 0, otherwise NOP
* ori $a0, src, %le_lo12(sym) # le_hi20 != 0, src = $a0,
# otherwise, src = $zero
TODO: When relaxation is enabled, redundant NOP can be removed. This
will be implemented in a future patch.
Note: In the normal or medium code model, original code sequence with
relocations allow interleaving, because converted code sequence
calculates the absolute offset. However, in extreme code model, to
identify the current code model, the first four instructions with
relocations must appear consecutively.
Fixes the following warning after the changes in
https://github.com/llvm/llvm-project/pull/134309:
```
llvm-project/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp:134:3: warning: default label in switch which covers all enumeration values [-Wcovered-switch-default]
default:
^
1 warning generated.
```
A file scope declaration without an initializer which is neither extern
nor thread_local is a tentative definition. If the declaration of an
identifier for an object is a tentative definition and has internal
linkage, the declared type shall not be an incomplete type.
Clang was previously failing to diagnose this in -pedantic mode.
Fixes#50661
---------
Co-authored-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
The test is checking output from MLIR debug prints. MLIR passes can be
executed in parallel, for example a pass on func.func might schedule
different func.func operations in different threads. This led to
intermittent test failures where debug output from different threads
became mixed up.
Fix by disabling mlir multithreading for this test.
This patch changes the preferInLoopReduction function to take a
RecurKind instead of an unsigned Opcode.
This makes it possible to distinguish non-arithmetic reductions such as
min/max, AnyOf, and FindLastIV, and also helps unify IAnyOf with FAnyOf
and IFindLastIV with FFindLastIV.
Related patch #118393#131830
Add support for lowering of convolution operations where the `acc_type`
attribute differs from the result type of the operation. The only case
of this in for convolutions in the TOSA-v1.0 specification is an fp16
convolution which internally uses an fp32 accumulator; all other
operations have accumulator types that match their output/result types.
Add lit tests for the fp16 convolution with fp32 accumulator operators
described above.
Signed-off-by: Jack Frankland <jack.frankland@arm.com>
Reorder sections in GitHub.rst so that "Branches" and "Stacked Pull
Requests" appear after the more general section on pull requests. This
improves the conceptual flow for readers new to the process:
New order:
* Introduction
* Before your first PR
* Pull Requests
* Approvals
* Landing your change
* Branches
* Stacked Pull Requests
* ...
Previous order:
* Introduction
* Before your first PR
* Branches
* Stacked Pull Requests
* Pull Requests
* Approvals
* Landing your change
* ...
This change only reorders existing text - no content edits.
In addition to authenticated pointers, consider the contents of a
register safe if it was
* written by PC-relative address computation
* updated by an arithmetic instruction whose input address is safe
Diagnostics at unknown locations can now be verified with
`-verify-diagnostics`.
Example:
```
// expected-error@unknown {{something went wrong}}
```
Also clean up some MemRefToLLVM conversion tests that had to redirect
all errors to stdout in order to FileCheck them. All of those tests can
now be stored in a single `invalid.mlir`. That was not possible before.
This is split off from #133977
VPBlendRecipe normalisation is sensitive to the number of users a mask
has, so should probably be run after the masks are simplified as much as
possible.
Note this could be run after removeDeadRecipes but this causes test
diffs, some regressions, so this is left to a later patch.
Relax the assumption that alloc op always has allocation at
`getResult(0)`, allow to use `optimize-allocation-liveness` pass for
custom ops with >1 results. Ops with multiple allocations are not
handled here yet.
The pointer argument for `wait_for_event(int, event_t*)` should take the
default address space: generic if available, otherwise private.
Before this patch it would always be generic with
`-fdeclare-opencl-builtins`. This was inconsistent with the behavior
when opencl-c.h is included.
In unreachable code, constant PHI nodes may appear and be replaced by their
single value. As a result, instructions may become self-referencing. This
commit adds checks to avoid going into infinite recursion when handling
self-referencing compare instructions in `evaluateOnPredecessorEdge()`.
This LLVM defect was identified via the AMD Fuzzing project.
We get a lot of issues that basically boil down to "I passed malformed
LLVM IR to clang and it crashed". Clang does not perform IR verification
by default in (non-assertion-enabled) release builds, and that's
sensible for IR that Clang itself produces, which is expected to always
be valid. However, if people pass in their own handwritten IR, we should
report if it is malformed, instead of crashing. We should also report it
in a way that does not produce a crash trace and ask for a bug report,
as currently happens in assertions-enabled builds. This aligns the
behavior with how opt/llc work.