When SelectiondDAG converts dbg.value intrinsics, it first ensures we have
already generated code for the value operator of the intrinsic. The rationale
being that if we haven't had the need to generate code for this value, it won't
be a debug value that causes the generation.
For example, if the first use the physical register of an argument is a
dbg.value, we are going to hit this code path. However, this is irrelevant for
entry value expressions: by definition we are not interested in the _current_
value of the physical register, but rather on its value at the start of the
function. To deal with this, this patch changes lowering to handle this case as
early as possible.
Differential Revision: https://reviews.llvm.org/D158649
The change introduces intrinsics 'get_fpmode', 'set_fpmode' and
'reset_fpmode'. They manage all target dynamic floating-point control
modes, which include, for instance, rounding direction, precision,
treatment of denormals and so on. The intrinsics do the same
operations as the C library functions 'fegetmode' and 'fesetmode'. By
default they are lowered to calls to these functions.
Two main use cases are supported by this implementation.
1. Local modification of the control modes. In this case the code
usually has a pattern (in pseudocode):
saved_modes = get_fpmode()
set_fpmode(<new_modes>)
...
<do operations under the new modes>
...
set_fpmode(saved_modes)
In the case when it is known that the current FP environment is default,
the code may be shorter:
set_fpmode(<new_modes>)
...
<do operations under the new modes>
...
reset_fpmode()
Such patterns appear not only in user code but also in implementations
of various FP controlling pragmas. In particular, the implementation of
`#pragma STDC FENV_ROUND` requires similar code if the target does not
support static rounding mode.
2. Portable control of FP modes. Usually FP control modes are set by
writing to some control register. Different targets have different
layout of this register, the way the register is accessed also may be
different. Using set of target-specific definitions for the control
register bits together with these intrinsic functions provides enough
portable way to handle control modes across wide range of hardware.
This change defines only llvm intrinsic function, which implement the
access required for the aforementioned use cases.
Differential Revision: https://reviews.llvm.org/D82525
This removes some diffs created by D153502.
I'm assuming an AND/OR won't be worse than an SMIN/SMAX. For
RISC-V at least, AND/OR can be a shorter encoding than SMIN/SMAX.
It's weird that we have two different functions responsible for
folding logic of setccs, but I'm not ready to try to untangle that.
I'm unclear if the PowerPC chang is a regression or not. It looks
like it might use more registers, but I don't understand PowerPC
register so I'm not sure.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D158292
After recent patch D30189, #64323's error message become a new one.
When DAGCombiner was optimizing `(vextract (scalar_to_vector val, 0) -> val`, it didn't
consider the possibility that the inserted value type has less bit than the dest type.
This patch fixes that.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D158355
After D157690 we are seeing some crashes from Global ISel, which seem to be
related to the shift_of_shifted_logic_chain combine that can remove too many
instructions if the shift amount is zero.
This limits the fold to non-zero shifts, under the assumption that it is better
in that case to fold away the shift to a COPY.
Differential Revision: https://reviews.llvm.org/D158596
With D149881, we converted EntryValue MachineFunction table entries into
`DbgVariables` initialized by a "DbgValue" intrinsic, which can only handle a
single, non-fragment DIExpression. However, it is desirable to handle variables
with multiple fragments and DIExpressions.
To do this, we expand the `DbgVariable` class to handle the EntryValue case.
This class can already operate under three different "modes" (stack slot,
unchanging location described by a dbg value, changing location described by a
loc list). A fourth case is added as a separate class entirely, but a subsequent
patch should redesign `DbgVariable` with four subclasses in order to make the
code more readable.
This patch also exposed a bug in the `beginEntryValueExpression` function, which
was not initializing the `LocationFlags` properly. Note how the
`finalizeEntryValue` function resets that flag. We fix this bug here, as testing
this changing in isolation would be tricky.
Differential Revision: https://reviews.llvm.org/D158458
When we convert an EntryValue dbg.declare into an entry of the MF side table, we
currently copy its DIExpression as is, and rely on subsequent layers to "know"
that this expression is implicitly indirect. This is bad because it adds an
implicit assumption to the IR representation, and requires subsequent layers to
know about this assumption. This also limits the reusability of this table:
what if, in the future, we want to use this table for dbg.values?
This patch changes existing behavior so that the entities converting
dbg_declares explicitly add an OP_deref when converting EntryValue dbg.declares.
Differential Revision: https://reviews.llvm.org/D158437
This adds some more extensive test coverage for fadd/fsub through global isel,
switching the opcodes to use the more complete ActionDefinitions to handle more
cases.
For most fp16 vector ops, we could promote it to fp32 vector when zvfhmin is enable but zvfh is not.
But for nxv32f16, we need to split it first since nxv32f32 is not a valid MVT.
Reviewed By: michaelmaitland
Differential Revision: https://reviews.llvm.org/D153848
Propeller and pseudo-probes map profiles back to Machine IR via basic block addresses that are stored in metadata sections.
Empty basic blocks (basic blocks without real code) obfuscate the profile mapping because their addresses collide with their next basic blocks.
For instance, the fallthrough block of an empty block should always be adjacent to it. Otherwise, a completely unnecessary jump would be added.
This patch adds a MachineFunction pass named `GCEmptyBasicBlocks` which attempts to garbage-collect the empty blocks before the `BasicBlockSections` and pass.
This pass removes each empty basic block after redirecting its incoming edges to its fall-through block.
The garbage-collection is not complete. We keep the empty block in 4 cases:
1. The empty block is an exception handling pad.
2. The empty block has its address taken.
3. The empty block is the last block of the function and it has
predecessors.
4. The empty block is the only block of the function.
The first three cases are extremely rare in normal code (no cases for the clang binary). Removing the blocks under the first two cases requires modifying exception handling structures and operands of non-terminator instructions -- which is doable but not worth the additional complexity in the pass.
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D107534
Rewrites some simple rules that cause little to no codegen regressions as MIR patterns.
I may have missed some easy cases, but some other rules have intentionally been left as-is because bigger
changes are needed to make them work.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D157690
This reverts commit 317a0fe5bd7113c0ac9d30b2de58ca409e5ff754.
This reverts commit 30c4b97aec60895a6905816670f493cdd1d7c546.
See post-commit discussions on https://reviews.llvm.org/D157750 that
we should use a different mechanism to handle the error with --cuda-gpu-arch=
The IR/DiagnosticInfo.cpp, warn_drv_for_elf_only, codegne tests in
clang/test/Driver, and the following driver behavior (downgrading error
to warning) changes are undesired.
```
% clang --target=riscv64 -fsplit-machine-functions -c a.c
warning: -fsplit-machine-functions is not valid for riscv64 [-Wbackend-plugin]
```
The class 'DbgVariable' can be in one of three states, and the "is any of them
initialization" logic for them is repeated in a couple of places. We may want to
expand this class in the future; as such, we factor out this common logic so
that it is easier to modify.
Differential Revision: https://reviews.llvm.org/D158438
Use getFixedValue instead of getKnownMinValue to convert TypeSize
to uint64_t. I believe this would have caught the bug fixed by
D157872.
To prevent false failures, I had to treat a scalable 0 as if it
is fixed value.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D158115
This is a reland of 46d2d7599d9ed5e68fb53e910feb10d47ee2667b, which was
reverted because of breaking build
https://lab.llvm.org/buildbot/#/builders/21/builds/78779. However, this
buildbot is spuriously broken due to Flang::underscoring.f90 being
nondeterministic.
Current behavior for relaxing out-of-range conditional branches
is to invert the conditional and insert a fallthrough unconditional
branch to the original destination. This approach biases the branch
predictor in the wrong direction, which can degrading performance.
Machine function splitting introduces many rarely-taken cross-section
conditional branches, which are improperly relaxed. Avoid inverting
these branches; instead, retarget them to trampolines at the end of the
function. Doing so increases the runtime cost of jumping to cold code
but eliminates the misprediction cost of jumping to hot code.
Differential Revision: https://reviews.llvm.org/D156837
D152276 wasn't handling the case where the inserted element is implicitly truncated into the vector - resulting in a i1 element (implicitly truncated from i8) overwriting 8 bits instead of 1 bit.
This patch is intended to be merged into 17.x so I've just disallowed any vector element vs inserted element type mismatch - technically we could be more elegant and permit truncated stores (as long as the store is still byte sized), but the use cases for that are so limited I'd prefer to play it safe for now.
Candidate patch for #64655 17.x merge
Differential Revision: https://reviews.llvm.org/D158366
Expand (s/z/any)ext instructions to be compatible with more
types for GlobalISel.
This patch mainly focuses on 64-bit and 128-bit vectors with
element size of powers of 2.
It also notably handles larger than legal vectors.
Differential Revision: https://reviews.llvm.org/D157113
This patch removes the `getBBIDOrNumber` which was introduced to allow emitting version 1.
Reviewed By: shenhan
Differential Revision: https://reviews.llvm.org/D158299
Some opcodes in MIR are defined to be convergent by the target by setting
IsConvergent in the corresponding TD file. For example, in AMDGPU, the opcodes
G_SI_CALL and G_INTRINSIC* are marked as convergent. But this is too
conservative, since calls to functions that do not execute convergent operations
should not be marked convergent. This information is available in LLVM IR.
The new flag MIFlag::NoConvergent now allows the IR translator to mark an
instruction as not performing any convergent operations. It is relevant only on
occurrences of opcodes that are marked isConvergent in the target.
Differential Revision: https://reviews.llvm.org/D157475
SwiftErrorValueTracking creates vregs at swifterror use sites and then
connects it with appropriate definitions after instruction selection.
To propagate swifterror values SwiftErrorValueTracking::propagateVRegs
iterates over basic blocks in RPO, but some vregs previously created
at use sites may be located in blocks that became unreachable after
instruction selection. Because of that there will no definition for
such vregs and that may cause issues down the pipeline.
To ensure that all vregs created by the SwiftErrorValueTracking will
be defined propagateVRegs was updated to insert IMPLICIT_DEF at the
beginning of unreachable blocks containing swifterror uses.
Related issue: https://github.com/llvm/llvm-project/issues/59751
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D141053
Targets may lose some optimization opportunities for certain vector operation
if we reduce BUILD_VECTOR to BITCAST early.
And if VT is not legal, reduce BUILD_VECTOR to BITCAST before LegailizeTypes
can get benefit. Because type-legalizer often scalarizes illegal type of vectors.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D156645
On the extract_subvector side, we already have the restriction. With D158201, we'd start getting unprofitable splat combines unless we add the same one on the extract_subvector side.
Differential Revision: https://reviews.llvm.org/D158202
Refactor BasicBlockSections to use the target-specific noop insertion
hook from TargetInstrInfo instead of building it ourselves. Using the
TII hook is both cleaner and makes it easier to extend BBSections to
non-X86 targets.
Differential Revision: https://reviews.llvm.org/D158303
We have an existing DAG combine for when an insert/extract subvector pair is entirely a nop, but we hadn't handled the case where the net result was either an insert or an extract (but not both). The transform is restricted to index = 0 to avoid having to adjust indices after the transform.
Differential Revision: https://reviews.llvm.org/D158201
DemandedBits is forced to all ones if there are multiple users.
The changes X86 test cases looks like they were miscompiles before.
The value of eax/rax from the cmov is returned from the function in
addition to being used by the sar. That usage needs all bits even
though the sar doesn't.
We have an existing DAG combine for when an insert/extract subvector pair is entirely a nop, but we hadn't handled the case where the net result was either an insert or an extract (but not both). The transform is restricted to index = 0 to avoid having to adjust indices after the transform.
Reviews, a couple comments on the test changes:
* Mostly RISCV, mostly schedule reordering.
* One real regression in splats-with-mixed-vl.ll due to a different overly aggressive combine, fix in a follow up patch.
* The test/CodeGen/X86/vector-replicaton-i1-mask.ll diff looked concerning at first, but not the mask size at most 4 i1s. I think the type changes on the mask loads are correct, but would welcome a second opinion with someone more familiar with AVX512 codegen.
Differential Revision: https://reviews.llvm.org/D158201
This reverts commit 54d663d5896008c09c938f80357e2a056454bc65, which breaks the test CodeGen/SystemZ/ctpop-01.ll for stage2-ubsan check (see https://lab.llvm.org/buildbot/#/builders/85/builds/18410)
I manually confirmed that the test had been passing immediately prior to that commit
(BUILDBOT_REVISION=4772c66cfb00d60f8f687930e9dd3aa1b6872228 llvm-zorg/zorg/buildbot/builders/sanitizers/buildbot_bootstrap_ubsan.sh)
This modifies the G_UADDE legalizaton to a version that looks shorter
on Mips and RISC-V when feeding the equivalent IR to SelectionDAG.
This also removes the boolean select from G_USUBE.
Comments taken from LegalizeDAG and tweaked.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D158232
RISC-V found a case where the CombineTo caused N to be CSEd with
an existing node and then deleted. The top level DAGCombiner loop
was surprised to find a node was deleted, but SDValue() was returned
from the visit function.
We need to return SDValue(N, 0) to tell the top level loop that
a change was made, but the worklist updates were already handled.
Fixes#64772.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D158208
If carryin was 1, and RHS is 0xffffffff we were not giving a carry
out.
In that case Res would be equal to LHS, so Res <u LHS would be false.
But there should be a carry out since carryin+RHS wraps around to 0.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D157943
Simple function which scalarizes Ops then ExtOrTruncs them according to function parameters
Differential Revision: https://reviews.llvm.org/D157733
Change-Id: Ie5215069228f7bf530cd2dbb4bd17cbf409e046a