34570 Commits

Author SHA1 Message Date
Felipe de Azevedo Piovezan
35f4ef1fee [SelectionDAG][DebugInfo] Handle entry_value dbg.value DIExprs earlier
When SelectiondDAG converts dbg.value intrinsics, it first ensures we have
already generated code for the value operator of the intrinsic. The rationale
being that if we haven't had the need to generate code for this value, it won't
be a debug value that causes the generation.

For example, if the first use the physical register of an argument is a
dbg.value, we are going to hit this code path.  However, this is irrelevant for
entry value expressions: by definition we are not interested in the _current_
value of the physical register, but rather on its value at the start of the
function. To deal with this, this patch changes lowering to handle this case as
early as possible.

Differential Revision: https://reviews.llvm.org/D158649
2023-08-24 09:33:53 -04:00
Matt Arsenault
d86a7d631c GlobalISel: Add constant fold combine for zext/sext/anyext
Could use more work for vectors.

https://reviews.llvm.org/D156534
2023-08-24 08:10:01 -04:00
Serge Pavlov
6862f0fab1 [FPEnv] Intrinsics for access to FP control modes
The change introduces intrinsics 'get_fpmode', 'set_fpmode' and
'reset_fpmode'. They manage all target dynamic floating-point control
modes, which include, for instance, rounding direction, precision,
treatment of denormals and so on. The intrinsics do the same
operations as the C library functions 'fegetmode' and 'fesetmode'. By
default they are lowered to calls to these functions.

Two main use cases are supported by this implementation.

1. Local modification of the control modes. In this case the code
usually has a pattern (in pseudocode):

    saved_modes = get_fpmode()
    set_fpmode(<new_modes>)
    ...
    <do operations under the new modes>
    ...
    set_fpmode(saved_modes)

In the case when it is known that the current FP environment is default,
the code may be shorter:

    set_fpmode(<new_modes>)
    ...
    <do operations under the new modes>
    ...
    reset_fpmode()

Such patterns appear not only in user code but also in implementations
of various FP controlling pragmas. In particular, the implementation of
`#pragma STDC FENV_ROUND` requires similar code if the target does not
support static rounding mode.

2. Portable control of FP modes. Usually FP control modes are set by
writing to some control register. Different targets have different
layout of this register, the way the register is accessed also may be
different. Using set of target-specific definitions for the control
register bits together with these intrinsic functions provides enough
portable way to handle control modes across wide range of hardware.

This change defines only llvm intrinsic function, which implement the
access required for the aforementioned use cases.

Differential Revision: https://reviews.llvm.org/D82525
2023-08-24 15:52:19 +07:00
Craig Topper
2ad50f354a [DAGCombiner][RISCV][AArch64][PowerPC] Restrict foldAndOrOfSETCC from using SMIN/SMAX where and OR/AND would do.
This removes some diffs created by D153502.

I'm assuming an AND/OR won't be worse than an SMIN/SMAX. For
RISC-V at least, AND/OR can be a shorter encoding than SMIN/SMAX.

It's weird that we have two different functions responsible for
folding logic of setccs, but I'm not ready to try to untangle that.

I'm unclear if the PowerPC chang is a regression or not. It looks
like it might use more registers, but I don't understand PowerPC
register so I'm not sure.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158292
2023-08-23 20:26:23 -07:00
Yingwei Zheng
d6639f83a9
[SDAG][RISCV] Avoid folding setcc (xor C1, -1), C2, cond into setcc (xor C2, -1), C1, cond
This patch fixes https://github.com/llvm/llvm-project/issues/64935.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158654
2023-08-24 04:18:17 +08:00
Peter Rong
f58fbfc746 [X86][CodeGen] Add a dag pattern to fix #64323
After recent patch D30189, #64323's error message become a new one.
When DAGCombiner was optimizing `(vextract (scalar_to_vector val, 0) -> val`, it didn't
consider the possibility that the inserted value type has less bit than the dest type.
This patch fixes that.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D158355
2023-08-23 10:50:32 -07:00
Rahman Lavaee
7dc6566273 Add file header for GCEmptyBasicBlocks.cpp. 2023-08-23 17:45:09 +00:00
David Green
adaf545a50 [GlobalISel] Limit shift_of_shifted_logic_chain to non-zero folds
After D157690 we are seeing some crashes from Global ISel, which seem to be
related to the shift_of_shifted_logic_chain combine that can remove too many
instructions if the shift amount is zero.

This limits the fold to non-zero shifts, under the assumption that it is better
in that case to fold away the shift to a COPY.

Differential Revision: https://reviews.llvm.org/D158596
2023-08-23 18:17:37 +01:00
Felipe de Azevedo Piovezan
af6d43ea66 [AsmPrinter][DebugInfo] Create EntryValue mode for DbgVariable
With D149881, we converted EntryValue MachineFunction table entries into
`DbgVariables` initialized by a "DbgValue" intrinsic, which can only handle a
single, non-fragment DIExpression. However, it is desirable to handle variables
with multiple fragments and DIExpressions.

To do this, we expand the `DbgVariable` class to handle the EntryValue case.
This class can already operate under three different "modes" (stack slot,
unchanging location described by a dbg value, changing location described by a
loc list). A fourth case is added as a separate class entirely, but a subsequent
patch should redesign `DbgVariable` with four subclasses in order to make the
code more readable.

This patch also exposed a bug in the `beginEntryValueExpression` function, which
was not initializing the `LocationFlags` properly. Note how the
`finalizeEntryValue` function resets that flag. We fix this bug here, as testing
this changing in isolation would be tricky.

Differential Revision: https://reviews.llvm.org/D158458
2023-08-23 12:29:18 -04:00
Felipe de Azevedo Piovezan
88417098bb [CodeGen][DebugInfo] Append OP_deref when converting an EntryValue dbg.declare
When we convert an EntryValue dbg.declare into an entry of the MF side table, we
currently copy its DIExpression as is, and rely on subsequent layers to "know"
that this expression is implicitly indirect. This is bad because it adds an
implicit assumption to the IR representation, and requires subsequent layers to
know about this assumption. This also limits the reusability of this table:
what if, in the future, we want to use this table for dbg.values?

This patch changes existing behavior so that the entities converting
dbg_declares explicitly add an OP_deref when converting EntryValue dbg.declares.

Differential Revision: https://reviews.llvm.org/D158437
2023-08-23 12:25:12 -04:00
David Green
ef0b8cf3f4 [AArch64][GISel] Expand coverage of FAdd and FSub.
This adds some more extensive test coverage for fadd/fsub through global isel,
switching the opcodes to use the more complete ActionDefinitions to handle more
cases.
2023-08-23 09:51:06 +01:00
Jianjian GUAN
879e801a91 [RISCV] Apply promotion for f16 vector ops when only have zvfhmin
For most fp16 vector ops, we could promote it to fp32 vector when zvfhmin is enable but zvfh is not.
But for nxv32f16, we need to split it first since nxv32f32 is not a valid MVT.

Reviewed By: michaelmaitland

Differential Revision: https://reviews.llvm.org/D153848
2023-08-23 16:49:20 +08:00
Rahman Lavaee
d0ec03a384 Revert "[BasicBlockSections] avoid insertting redundant branch to fall through blocks"
This reverts commit ab53109166c0345a79cbd6939cf7bc764a982856 which was
commited by mistake.
2023-08-23 01:09:13 +00:00
Rahman Lavaee
ab53109166 [BasicBlockSections] avoid insertting redundant branch to fall through blocks 2023-08-22 23:32:02 +00:00
Rahman Lavaee
e280e406c2 Add a pass to garbage-collect empty basic blocks after code generation.
Propeller and pseudo-probes map profiles back to Machine IR via basic block addresses that are stored in metadata sections.
Empty basic blocks (basic blocks without real code) obfuscate the profile mapping because their addresses collide with their next basic blocks.
For instance, the fallthrough block of an empty block should always be adjacent to it. Otherwise, a completely unnecessary jump would be added.
This patch adds a MachineFunction pass named `GCEmptyBasicBlocks` which attempts to garbage-collect the empty blocks before the `BasicBlockSections` and pass.
This pass removes each empty basic block after redirecting its incoming edges to its fall-through block.
The garbage-collection is not complete. We keep the empty block in 4 cases:
      1. The empty block is an exception handling pad.
      2. The empty block has its address taken.
      3. The empty block is the last block of the function and it has
         predecessors.
      4. The empty block is the only block of the function.
The first three cases are extremely rare in normal code (no cases for the clang binary). Removing the blocks under the first two cases requires modifying exception handling structures and operands of non-terminator instructions -- which is doable but not worth the additional complexity in the pass.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D107534
2023-08-22 22:42:19 +00:00
Daniel Hoekwater
90ab85a1b2 Reland "[CodeGen][AArch64] Make MFS testable on AArch64"
Reverted by 3d22dac6c3b97d7bb92f243886dfb0d32a5c42e9 because it depended
on b9d079d6188b50730e0a67267b7fee36008435ce, which broke some tests.
2023-08-22 20:21:33 +00:00
pvanhout
2d87319f06 [GlobalISel] Rewrite some simple rules using MIR Patterns
Rewrites some simple rules that cause little to no codegen regressions as MIR patterns.

I may have missed some easy cases, but some other rules have intentionally been left as-is because bigger
changes are needed to make them work.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D157690
2023-08-22 09:09:54 +02:00
Fangrui Song
77596e6b16 Revert D157750 "[Driver][CodeGen] Properly handle -fsplit-machine-functions for fatbinary compilation."
This reverts commit 317a0fe5bd7113c0ac9d30b2de58ca409e5ff754.
This reverts commit 30c4b97aec60895a6905816670f493cdd1d7c546.

See post-commit discussions on https://reviews.llvm.org/D157750 that
we should use a different mechanism to handle the error with --cuda-gpu-arch=

The IR/DiagnosticInfo.cpp, warn_drv_for_elf_only, codegne tests in
clang/test/Driver, and the following driver behavior (downgrading error
to warning) changes are undesired.
```
% clang --target=riscv64 -fsplit-machine-functions -c a.c
warning: -fsplit-machine-functions is not valid for riscv64 [-Wbackend-plugin]
```
2023-08-21 13:54:15 -07:00
Felipe de Azevedo Piovezan
32223123d3 [DwarfDebug][NFC] Factor out 'isInitialized' logic
The class 'DbgVariable' can be in one of three states, and the "is any of them
initialization" logic for them is repeated in a couple of places. We may want to
expand this class in the future; as such, we factor out this common logic so
that it is easier to modify.

Differential Revision: https://reviews.llvm.org/D158438
2023-08-21 15:15:14 -04:00
Craig Topper
e620eac75e [SelectionDAG][RISCV][SVE] Harden fixed offset version of ComputeValueVTs against scalable offsets.
Use getFixedValue instead of getKnownMinValue to convert TypeSize
to uint64_t. I believe this would have caught the bug fixed by
D157872.

To prevent false failures, I had to treat a scalable 0 as if it
is fixed value.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D158115
2023-08-21 10:36:17 -07:00
Daniel Hoekwater
e223e45677 Reland "[AArch64][CodeGen] Avoid inverting hot branches during relaxation""
This is a reland of 46d2d7599d9ed5e68fb53e910feb10d47ee2667b, which was
reverted because of breaking build
https://lab.llvm.org/buildbot/#/builders/21/builds/78779. However, this
buildbot is spuriously broken due to Flang::underscoring.f90 being
nondeterministic.
2023-08-21 17:29:47 +00:00
Daniel Hoekwater
0303137bfc Revert "[AArch64][CodeGen] Avoid inverting hot branches during relaxation"
This reverts commit 46d2d7599d9ed5e68fb53e910feb10d47ee2667b.
Breaks build https://lab.llvm.org/buildbot/#/builders/21/builds/78779
2023-08-21 17:13:35 +00:00
Daniel Hoekwater
46d2d7599d [AArch64][CodeGen] Avoid inverting hot branches during relaxation
Current behavior for relaxing out-of-range conditional branches
is to invert the conditional and insert a fallthrough unconditional
branch to the original destination. This approach biases the branch
predictor in the wrong direction, which can degrading performance.

Machine function splitting introduces many rarely-taken cross-section
conditional branches, which are improperly relaxed. Avoid inverting
these branches; instead, retarget them to trampolines at the end of the
function. Doing so increases the runtime cost of jumping to cold code
but eliminates the misprediction cost of jumping to hot code.

Differential Revision: https://reviews.llvm.org/D156837
2023-08-21 16:41:02 +00:00
Benjamin Kramer
a4202e65cf Move VTList pointer out of RegClassInfos
Store it in TargetRegisterInfo instead. Worth 54k on llc size.
2023-08-21 17:40:40 +02:00
Simon Pilgrim
ba818c4019 [DAG] replaceStoreOfInsertLoad - don't fold if the inserted element is implicitly truncated
D152276 wasn't handling the case where the inserted element is implicitly truncated into the vector - resulting in a i1 element (implicitly truncated from i8) overwriting 8 bits instead of 1 bit.

This patch is intended to be merged into 17.x so I've just disallowed any vector element vs inserted element type mismatch - technically we could be more elegant and permit truncated stores (as long as the store is still byte sized), but the use cases for that are so limited I'd prefer to play it safe for now.

Candidate patch for #64655 17.x merge

Differential Revision: https://reviews.llvm.org/D158366
2023-08-21 11:22:07 +01:00
Tuan Chuong Goh
a40c984976 [AArch64][GlobalISel] Support more legal types for EXTEND
Expand (s/z/any)ext instructions to be compatible with more
types for GlobalISel.
This patch mainly focuses on 64-bit and 128-bit vectors with
element size of powers of 2.
It also notably handles larger than legal vectors.

Differential Revision: https://reviews.llvm.org/D157113
2023-08-21 09:51:17 +01:00
Kazu Hirata
134115618a [CodeGen] Use isAllOnesConstant and isNullConstant (NFC) 2023-08-20 22:56:40 -07:00
Fangrui Song
41e71f500d [GlobalISel] Remove unneeded empty check. NFC 2023-08-20 21:11:13 -07:00
Rahman Lavaee
69e47deca9 [Propeller] Deprecate Codegen paths for SHT_LLVM_BB_ADDR_MAP version 1.
This patch removes the `getBBIDOrNumber` which was introduced to allow emitting version 1.

Reviewed By: shenhan

Differential Revision: https://reviews.llvm.org/D158299
2023-08-20 18:29:47 +00:00
Sameer Sahasrabuddhe
ef38e6d97f [GlobalISel] introduce MIFlag::NoConvergent
Some opcodes in MIR are defined to be convergent by the target by setting
IsConvergent in the corresponding TD file. For example, in AMDGPU, the opcodes
G_SI_CALL and G_INTRINSIC* are marked as convergent. But this is too
conservative, since calls to functions that do not execute convergent operations
should not be marked convergent. This information is available in LLVM IR.

The new flag MIFlag::NoConvergent now allows the IR translator to mark an
instruction as not performing any convergent operations. It is relevant only on
occurrences of opcodes that are marked isConvergent in the target.

Differential Revision: https://reviews.llvm.org/D157475
2023-08-20 21:14:46 +05:30
Simon Pilgrim
95865e5138 [DAG] SimplifyDemandedBits - if we're only demanding the signbit, a SMIN/SMAX node can be simplified to a OR/AND node respectively.
Alive2: https://alive2.llvm.org/ce/z/MehvFB

REAPPLIED from 54d663d5896008 with fix for using the correct DemandedBits mask.
2023-08-20 14:20:49 +01:00
Filipp Zhinkin
08d0b558f5 [SwiftError] Use IMPLICIT_DEF as a definition for unreachable VReg uses
SwiftErrorValueTracking creates vregs at swifterror use sites and then
connects it with appropriate definitions after instruction selection.
To propagate swifterror values SwiftErrorValueTracking::propagateVRegs
iterates over basic blocks in RPO, but some vregs previously created
at use sites may be located in blocks that became unreachable after
instruction selection. Because of that there will no definition for
such vregs and that may cause issues down the pipeline.

To ensure that all vregs created by the SwiftErrorValueTracking will
be defined propagateVRegs was updated to insert IMPLICIT_DEF at the
beginning of unreachable blocks containing swifterror uses.

Related issue: https://github.com/llvm/llvm-project/issues/59751

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D141053
2023-08-20 13:00:31 +02:00
Kazu Hirata
d85993d28f [llvm] Remove redundant control flow statements (NFC) 2023-08-19 08:07:30 -07:00
Jim Lin
18f5ada244 [DAGCombiner] Don't reduce BUILD_VECTOR to BITCAST before LegalizeTypes if VT is legal.
Targets may lose some optimization opportunities for certain vector operation
if we reduce BUILD_VECTOR to BITCAST early.

And if VT is not legal, reduce BUILD_VECTOR to BITCAST before LegailizeTypes
can get benefit. Because type-legalizer often scalarizes illegal type of vectors.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D156645
2023-08-19 12:53:50 +08:00
Philip Reames
92e0c0dc1a [DAG] Restrict insert_subvector undef, splat_veector, dontcare transform
On the extract_subvector side, we already have the restriction. With D158201, we'd start getting unprofitable splat combines unless we add the same one on the extract_subvector side.

Differential Revision: https://reviews.llvm.org/D158202
2023-08-18 12:44:09 -07:00
Daniel Hoekwater
ca72b0a709 [CodeGen] Use the TII hook for Noop insertion in BBSections (NFC)
Refactor BasicBlockSections to use the target-specific noop insertion
hook from TargetInstrInfo instead of building it ourselves. Using the
TII hook is both cleaner and makes it easier to extend BBSections to
non-X86 targets.

Differential Revision: https://reviews.llvm.org/D158303
2023-08-18 19:40:11 +00:00
Philip Reames
67b71ad04a [DAG] Fold insert_subvector undef, (extract_subvector X, 0), 0 with non-matching types
We have an existing DAG combine for when an insert/extract subvector pair is entirely a nop, but we hadn't handled the case where the net result was either an insert or an extract (but not both). The transform is restricted to index = 0 to avoid having to adjust indices after the transform.

Differential Revision: https://reviews.llvm.org/D158201
2023-08-18 12:28:27 -07:00
Craig Topper
bbbb93eb48 Revert "[DAG] Fold insert_subvector undef, (extract_subvector X, 0), 0 with non-matching types"
This reverts commit 770be43f6782dab84d215d01b37396d63a9c2b6e.

Forgot to remove from my tree while experimenting.
2023-08-18 12:00:07 -07:00
Craig Topper
0a5347f40d [DAG] SimplifyDemandedBits - Use DemandedBits intead of OriginalDemandedBits to when simplifying UMIN/UMAX to AND/OR.
DemandedBits is forced to all ones if there are multiple users.

The changes X86 test cases looks like they were miscompiles before.
The value of eax/rax from the cmov is returned from the function in
addition to being used by the sar. That usage needs all bits even
though the sar doesn't.
2023-08-18 11:59:18 -07:00
Craig Topper
770be43f67 [DAG] Fold insert_subvector undef, (extract_subvector X, 0), 0 with non-matching types
We have an existing DAG combine for when an insert/extract subvector pair is entirely a nop, but we hadn't handled the case where the net result was either an insert or an extract (but not both).  The transform is restricted to index = 0 to avoid having to adjust indices after the transform.

Reviews, a couple comments on the test changes:
* Mostly RISCV, mostly schedule reordering.
* One real regression in splats-with-mixed-vl.ll due to a different overly aggressive combine, fix in a follow up patch.
* The test/CodeGen/X86/vector-replicaton-i1-mask.ll diff looked concerning at first, but not the mask size at most 4 i1s.  I think the type changes on the mask loads are correct, but would welcome a second opinion with someone more familiar with AVX512 codegen.

Differential Revision: https://reviews.llvm.org/D158201
2023-08-18 11:59:18 -07:00
Thurston Dang
29b2009061 Revert "[DAG] SimplifyDemandedBits - if we're only demanding the signbit, a SMIN/SMAX node can be simplified to a OR/AND node respectively."
This reverts commit 54d663d5896008c09c938f80357e2a056454bc65, which breaks the test CodeGen/SystemZ/ctpop-01.ll for stage2-ubsan check (see https://lab.llvm.org/buildbot/#/builders/85/builds/18410)

I manually confirmed that the test had been passing immediately prior to that commit
(BUILDBOT_REVISION=4772c66cfb00d60f8f687930e9dd3aa1b6872228 llvm-zorg/zorg/buildbot/builders/sanitizers/buildbot_bootstrap_ubsan.sh)
2023-08-18 18:08:10 +00:00
Simon Pilgrim
bd9bf9cb67 [X86] SimplifyDemandedBits - move MaskedValueIsZero as late as possible to avoid unnecessary (recursive) analysis costs. NFC.
Mentioned on D155472 for the SHL equivalent
2023-08-18 15:14:06 +01:00
Simon Pilgrim
4cd1c07491 [DAG] SimplifyDemandedBits - if we're only demanding the msb, a UMIN/UMAX node can be simplified to a AND/OR node respectively.
Alive2: https://alive2.llvm.org/ce/z/qnvmc6
2023-08-18 12:12:22 +01:00
Simon Pilgrim
54d663d589 [DAG] SimplifyDemandedBits - if we're only demanding the signbit, a SMIN/SMAX node can be simplified to a OR/AND node respectively.
Alive2: https://alive2.llvm.org/ce/z/MehvFB
2023-08-18 11:35:34 +01:00
Carl Ritson
ad9eed1e77 [MachineVerifier] Verify LiveIntervals for PHIs
Implement basic support for verifying LiveIntervals for PHIs.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156872
2023-08-18 18:14:22 +09:00
Craig Topper
c6dee6982f [GlobalISel][Mips] Sync G_UADDE and G_USUBE legalization with LegalizeDAG.
This modifies the G_UADDE legalizaton to a version that looks shorter
on Mips and RISC-V when feeding the equivalent IR to SelectionDAG.
This also removes the boolean select from G_USUBE.

Comments taken from LegalizeDAG and tweaked.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158232
2023-08-17 20:36:55 -07:00
Jie Fu
d1a4b8c56f [GlobalISel] Remove unused variable 'Or' (NFC)
/Users/jiefu/llvm-project/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:3450:10: error: unused variable 'Or' [-Werror,-Wunused-variable]
    auto Or = MIRBuilder.buildOr(CarryOut, And, Res_ULT_LHS);
         ^
1 error generated.
2023-08-18 06:40:41 +08:00
Craig Topper
846fbb06b8 [DAGCombiner][RISCV] Return SDValue(N, 0) instead of SDValue() after 2 calls to CombineTo in visitSTORE.
RISC-V found a case where the CombineTo caused N to be CSEd with
an existing node and then deleted. The top level DAGCombiner loop
was surprised to find a node was deleted, but SDValue() was returned
from the visit function.

We need to return SDValue(N, 0) to tell the top level loop that
a change was made, but the worklist updates were already handled.

Fixes #64772.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158208
2023-08-17 15:13:36 -07:00
Craig Topper
ebb2e5ebb2 [GlobalISel][Mips] Correct corner case in G_UADDE legalization.
If carryin was 1, and RHS is 0xffffffff we were not giving a carry
out.

In that case Res would be equal to LHS, so Res <u LHS would be false.
But there should be a carry out since carryin+RHS wraps around to 0.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D157943
2023-08-17 15:06:16 -07:00
Jeffrey Byrnes
d26a06728d [DAG] NFC: Add getBitcastedExtOrTrunc
Simple function which scalarizes Ops then ExtOrTruncs them according to function parameters

Differential Revision: https://reviews.llvm.org/D157733

Change-Id: Ie5215069228f7bf530cd2dbb4bd17cbf409e046a
2023-08-17 14:29:17 -07:00