Do not remove S_CBRANCH_EXECZ if one of the following blocks contains an
unconditional branch to a block other than the one immediately following
it. This can cause unwanted behavior like infinite loops.
Clean up `populateVectorToLLVMConversionPatterns` so that it populates
only conversion patterns. All rewrite patterns that do not lower to LLVM
should be populated into a separate greedy pattern rewrite.
The current combination of rewrite patterns and conversion patterns
triggered an edge case when merging the 1:1 and 1:N dialect conversions.
Depends on #119973.
The mask materialization patterns during `VectorToLLVM` are rewrite
patterns. They should run as part of the greedy pattern rewrite and not
the dialect conversion. (Rewrite patterns and conversion patterns are
not generally compatible.)
The current combination of rewrite patterns and conversion patterns
triggered an edge case when merging the 1:1 and 1:N dialect conversions.
When two 16-bit values are combined into a v2x16 vector, and those
values are truncated come from 32-bit values, a PRMT instruction can
save registers by selecting bytes directly from the original 32-bit
values. We do this during a post-legalize DAG combine, as these
opportunities are typically only exposed after the BUILD_VECTOR's
operands have been legalized.
Additionally, if the 32-bit values are right-shifted, we can fold in the
shift by selecting higher bytes with PRMT. Only logical right-shifts by
16 are supported (for now) since those are the only situations seen in
practice. Right shifts by 16 often come up during the legalization of
EXTRACT_VECTOR_ELT.
This idea was brought up in a PR comment by @Artem-B.
If x is NaN, then fmul (x, 1) may produce a different NaN value.
Our float semantics explicitly permit folding fmul (x, 1) to x, but we
can't do this when we're replacing a select input, as selects are
supposed to preserve the exact bitwise value.
Fixes
https://github.com/llvm/llvm-project/pull/115152#issuecomment-2545773114.
Summary:
Previously, we'd add all SPs distinct from the cloned one into a set.
Then when cloning a local scope we'd check if it's from one of those
'distinct' SPs by checking if it's in the set. We don't need to do that.
We can just check against the cloned SP directly and drop the set.
Test Plan:
ninja check-llvm-unit check-llvm
This makes sure no optimizations are applied that assume the
bigger alignment or size, which could be incorrect if we link
together with non-instrumented code.
Lowering to load-acquire/store-release for RISCV Zalasr.
Currently uses the psABI lowerings for WMO load-acquire/store-release
(which are identical to A.7). These are incompatable with the A.6
lowerings currently used by LLVM. This should be OK for now since Zalasr
is behind the enable experimental extensions flag, but needs to be fixed
before it is removed from that.
For TSO, it uses the standard Ztso mappings except for lowering seq_cst
loads/store to load-acquire/store-release, I had Andrea review that.
```
% echo 90 | llvm-mc -triple=x86_64 --disassemble --hex
.text
nop
```
The initial `.text` kludge is due `initSection`, which is actually only
needed by AIX XCOFF for its `getCurrentSectionOnly()` use in
MCAsmStreamer::emitInstruction (https://reviews.llvm.org/D95518). Adjust
MCAsmStreamer::emitInstruction to not trigger failures on
```
echo 7c4303a6 | llvm-mc --cdis --hex --triple=powerpc-aix-ibm-xcoff
```
Pull Request: https://github.com/llvm/llvm-project/pull/120185
Depends on #113811
Support `R_AARCH64_AUTH_ADR_GOT_PAGE`, `R_AARCH64_AUTH_GOT_LO12_NC` and
`R_AARCH64_AUTH_GOT_ADD_LO12_NC` GOT-generating relocations. For preemptible
symbols, dynamic relocation `R_AARCH64_AUTH_GLOB_DAT` is emitted. Otherwise,
we unconditionally emit `R_AARCH64_AUTH_RELATIVE` dynamic relocation since
pointers in signed GOT needs to be signed during dynamic link time.
This fixes a bug where report links generated from files such as
StylePrimitiveNumericTypes+Conversions.h in WebKit result in an error.
Co-authored-by: Brianna Fan <bfan2@apple.com>
This should actually fix the problem as I validated that github.sha returns an
actual value by running a workflow in a test repo. I'm not sure why the
existing value doesn't work, but it returns nothing.
Introduces a new layer interface, LinkGraphLayer, that can be used to
add LinkGraphs to an ExecutionSession.
This patch moves most of ObjectLinkingLayer's functionality into a new
LinkGraphLinkingLayer which should (in the future) be able to be used
without linking libObject. ObjectLinkingLayer now inherits from
LinkGraphLinkingLayer and just handles conversion of object files to
LinkGraphs, which are then handed down to LinkGraphLinkingLayer to be
linked.
AMDGPU: Delete spills of undef values
It would be a bit more logical to preserve the undef and do the normal
expansion, but this is less work. This avoids verifier errors in a
future patch which starts deleting liveness from registers after
allocation failures which results in spills of undef values.
https://reviews.llvm.org/D122607
Move where undef sgpr spills are deleted
This reverts commit f7443905af1e06eaacda1e437fff8d54dc89c487.
This is to avoid an assertion if an undef operand appears in a
stackmap. This is important to avoid hitting verifier errors
when register allocation starts adding undefs in error scenarios.
Rather than trying to treat undef operands as special, leave them
alone and avoid producing an invalid spill. It would a bit more
precise to produce a spill of an undef register here, but that's not
exposed through the storeRegToStackSlot API.
https://reviews.llvm.org/D122605
This was an alternative to https://reviews.llvm.org/D122582
Identical Code Folding (ICF) folds functions that are identical into one
function, and updates symbol addresses to the new address. This reduces
the size of a binary, but can lead to problems. For example when
function pointers are compared. This can be done either explicitly in
the code or generated IR by optimization passes like Indirect Call
Promotion (ICP). After ICF what used to be two different addresses
become the same address. This can lead to a different code path being
taken.
This is where safe ICF comes in. Linker (LLD) does it using address
significant section generated by clang. If symbol is in it, or an object
doesn't have this section symbols are not folded.
BOLT does not have the information regarding which objects do not have
this section, so can't re-use this mechanism.
This implementation scans code section and conservatively marks
functions symbols as unsafe. It treats symbols as unsafe if they are
used in non-control flow instruction. It also scans through the data
relocation sections and does the same for relocations that reference a
function symbol. The latter handles the case when function pointer is
stored in a local or global variable, etc. If a relocation address
points within a vtable these symbols are skipped.
We already support vector types, and since matrix element types have to
be scalar types, there should be no problem w/ just enabling this.
This now also allows matrix types to be stored in STL containers.
`--disassemble`/`--cdis` parses input bytes as decimal, 0bbin, 0ooct, or
0xhex. While the hexadecimal digit form is most commonly used, requiring
a 0x prefix for each byte (`0x48 0x29 0xc3`) is cumbersome.
Tools like xxd -p and rz-asm use a plain hex dump form without the 0x
prefix or space separator. This patch adds --hex to disassemble such hex
bytes with optional whitespace.
```
% rz-asm -a x86 -b 64 -d 4829c34829c4
sub rbx, rax
sub rsp, rax
% llvm-mc -triple=x86_64 --cdis --hex --output-asm-variant=1 <<< 4829c34829c4
.text
sub rbx, rax
sub rsp, rax
```
Pull Request: https://github.com/llvm/llvm-project/pull/119992
CodeGen will allocate memory for a new descriptor on descriptor loads.
CUDA Fortran local descriptor are allocated in managed memory by the
runtime. The newly allocated storage for cuda descriptor must also be
allocated through the runtime.
VPInstruction has a definition of mayWriteToMemory, which seems to only
be used by VPlanSLP. However VPInstructions are already handled in
VPRecipeBase::mayWriteToMemory, and everywhere else seems to use this
definition. I think these should be the same for all intents and
purposes. The VPRecipeBase definition is more conservative but returns
true for stores/calls/invokes/SLPStores.
Essentially, this makes this ill-formed:
```c++
using mat4 = _BitInt(12) [[clang::matrix_type(3, 3)]];
```
This matches preexisting behaviour for vector types (e.g.
`ext_vector_type`), and given that LLVM IR intrinsics for matrices also
take vector types, it seems like a sensible thing to do.
This is currently especially problematic since we sometimes lower matrix
types to LLVM array types instead, and while e.g. `[4 x i32]` and `<4 x
i32>` *probably* have the same similar memory layout (though I don’t
think it’s sound to rely on that either, see #117486), `[4 x i12]` and
`<4 x i12>` definitely don’t.
This change allows to expose through an interface attributes wrapping
content as external resources, and the usage inside the ModuleToObject
show how we will be able to provide runtime libraries without relying on
the filesystem.
Resolves https://github.com/llvm/llvm-project/issues/99161
- [x] Implement `WaveActiveAllTrue` clang builtin,
- [x] Link `WaveActiveAllTrue` clang builtin with `hlsl_intrinsics.h`
- [x] Add sema checks for `WaveActiveAllTrue` to
`CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp`
- [x] Add codegen for `WaveActiveAllTrue` to `EmitHLSLBuiltinExpr` in
`CGBuiltin.cpp`
- [x] Add codegen tests to
`clang/test/CodeGenHLSL/builtins/WaveActiveAllTrue.hlsl`
- [x] Add sema tests to
`clang/test/SemaHLSL/BuiltIns/WaveActiveAllTrue-errors.hlsl`
- [x] Create the `int_dx_WaveActiveAllTrue` intrinsic in
`IntrinsicsDirectX.td`
- [x] Create the `DXILOpMapping` of `int_dx_WaveActiveAllTrue` to `114`
in `DXIL.td`
- [x] Create the `WaveActiveAllTrue.ll` and
`WaveActiveAllTrue_errors.ll` tests in `llvm/test/CodeGen/DirectX/`
- [x] Create the `int_spv_WaveActiveAllTrue` intrinsic in
`IntrinsicsSPIRV.td`
- [x] In SPIRVInstructionSelector.cpp create the `WaveActiveAllTrue`
lowering and map it to `int_spv_WaveActiveAllTrue` in
`SPIRVInstructionSelector::selectIntrinsic`.
- [x] Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveAllTrue.ll`
Update VPReductionPHIRecipe::execute to use the start value from the
start value operand of the recipe. This is needed to make sure we resume
from the correct value during epilogue vectorization.
At the moment, the start value is set to the sentinel value in
adjustRecipesForReductions, as the original start value needs to be used
when creating ResumePhi recipes.
Fixes a mis-compile introduced by b3cba9be41bfa8 in SPEC2017 on AArch64.
Creates a new toctree "Support" under which we have distinct links to arch,
platform, and compiler support.
* Moved "Platform Support" from index landing page to new doc.
* Created explicit "Architecture Support". Requested in https://github.com/llvm/llvm-project/issues/118964#issuecomment-2531503046.
* Moved "Compiler Support" from Status toctree to new Support toctree.
---------
Co-authored-by: Carlo Cabrera <github@carlo.cab>