34024 Commits

Author SHA1 Message Date
Sameer Sahasrabuddhe
fbe1c0616f [LLVM][Uniformity] Improve detection of uniform registers
The MachineUA now queries the target to determine if a given register holds a
uniform value. This is determined using the corresponding register bank if
available, or by a combination of the register class and value type. This
assumes that the target is optimizing for performance by choosing registers, and
the target is responsible for any mismatch with the inferred uniformity.

For example, on AMDGPU, an SGPR is now treated as uniform, except if the
register bank is VCC (i.e., the register holds a wave-wide vector of 1-bit
values) or equivalently if it has a value type of s1.

 - This does not always work with inline asm, where the register bank or the
   value type might not be present. We assume that the SGPR is uniform, because
   it is not expected to be s1 in the vast majority of cases.
 - The pseudo branch instruction SI_LOOP is now hard-coded to be always
   divergent, although its condition is an SGPR.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D150438
2023-05-16 09:37:04 +05:30
Jessica Paquette
407b4648b8 [MachineOutliner] NFC: Add debug output to MachineOutliner::outline
Add some debug output to `outline` to assist in debugging + understanding the
code.

This will say

- How many things we found worth turning into outlined functions
- Whether or not candidates were pruned via the outlining algorithm
- The function created (if it was created)
- Where the calls were inserted
- What instruction was used to create the call

Sample output below:

```
NUMBER OF POTENTIAL FUNCTIONS: 5
WALKING FUNCTION LIST
PRUNED: 0/2 candidates
OUTLINE: Expected benefit (12 B) > threshold (1 B)
NEW FUNCTION: OUTLINED_FUNCTION_0
CREATE OUTLINED CALLS
  CALL: OUTLINED_FUNCTION_0 in bar:<unknown>
   .. BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp
  CALL: OUTLINED_FUNCTION_0 in bar:<unknown>
   .. BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp
PRUNED: 2/2 candidates
SKIP: Expected benefit (0 B) < threshold (1 B)
PRUNED: 0/2 candidates
OUTLINE: Expected benefit (8 B) > threshold (1 B)
NEW FUNCTION: OUTLINED_FUNCTION_1
CREATE OUTLINED CALLS
  CALL: OUTLINED_FUNCTION_1 in bar:<unknown>
   .. BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp
  CALL: OUTLINED_FUNCTION_1 in bar:<unknown>
   .. BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp
PRUNED: 2/2 candidates
SKIP: Expected benefit (0 B) < threshold (1 B)
PRUNED: 2/2 candidates
SKIP: Expected benefit (0 B) < threshold (1 B)
```
2023-05-15 15:29:26 -07:00
Muhammad Omair Javaid
6b22608a1d Revert "Emit the correct flags for the PROC CodeView Debug Symbol"
This reverts commit e48826e016e2f427f3b7b1274166aa9aa0ea7f4f.

https://lab.llvm.org/buildbot/#/builders/219/builds/2520

ldb-shell :: SymbolFile/PDB/function-nested-block.test

Differential Revision: https://reviews.llvm.org/D148761
2023-05-15 23:38:07 +04:00
J. Ryan Stinnett
d6e4c4f8c1 Revert "[X86] Use the CFA as the DWARF frame base for better variable locations around calls."
This reverts commit d421f5226048e4a5d88aab157d0f4d434c43f208.

LLDB tests are failing as shown in
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/55133/testReport/
2023-05-15 16:53:52 +01:00
Sameer Sahasrabuddhe
b0f0dd2554 [LLVM][Uniformity] Propagate temporal divergence explicitly
At a cycle C with divergent exits, UA was using a naive traversal of the exiting
edges to locate blocks that may use values defined inside C. But this traversal
fails when it encounters a cycle. This is now replaced with a much simpler
propagation that iterates over every instruction in C and checks any uses that
are outside C. But such an iteration can be expensive when C is very large; the
original strategy may need to be reconsidered if there is a regression in
compilation times.

Also fixed lit tests that should have originally caught the missed propagation
of temporal divergence.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D149646
2023-05-15 20:17:43 +05:30
Kyle Huey
d421f52260 [X86] Use the CFA as the DWARF frame base for better variable locations around calls.
Prior to this patch, for the DWARF frame base LLVM uses the frame pointer
register if available, otherwise the stack pointer register. If the stack
pointer register is being used and a call or other code modifies the stack
pointer during the body of the function this results in the locations being
wrong and the debugger displaying the wrong values for variables.

By using DW_OP_call_frame_cfa in these situations the emitted location for
the variable will automatically handle changes in the stack pointer.
The CFA needs to be adjusted for the offset between the frame pointer/stack
pointer to allow the variable locations themselves to remain unchanged by
this patch.

Reviewed By: #debug-info, scott.linder, jryans

Differential Revision: https://reviews.llvm.org/D143463
2023-05-15 15:10:02 +01:00
Phoebe Wang
057e14df70 [Coverity] Fix unchecked return value, NFC 2023-05-14 18:50:20 +08:00
Craig Topper
9ad9380fbc [LegalizeVectorOps][AArch64][RISCV][X86] Use OpVT for ISD::SETCC in LegalizeVectorOps.
Previously, LegalizeVectorOps used the result VT while LegalizeDAG
used the operand VT. This patch makes them both use the operand VT.

This also makes it consistent with how the default cost model works.

I've hacked the AArch64 cost model to maintain old behavior for some
f16 vectors.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D149572
2023-05-13 23:33:00 -07:00
Noah Goldstein
e36caaeeb2 [SelectionDAG] Use computeKnownBits if Op is not recognized by isKnownNeverZero
The current logic is pretty limitted unless the `Op` is a
constant. This at least covers more obvious cases.

Reviewed By: craig.topper, foad

Differential Revision: https://reviews.llvm.org/D149196
2023-05-13 14:36:04 -05:00
Noah Goldstein
da9f306739 [SelectionDAG] Limit max recursion in isKnownNeverZero and isKnownToBeAPowerOfTwo
Both of these functions recursively call themselves so it makes sense
to limit that upper bound.

Differential Revision: https://reviews.llvm.org/D149195
2023-05-13 14:35:57 -05:00
Florian Hahn
e351b9b66d
[EarlyIfCvt] Don't if-convert if condition has only loop-invariant ops.
This patch adds a heuristic to skip if-conversion if the condition has a
high chance of being predictable.

If the condition is in a loop, consider it predictable if the condition
itself or all its operands are loop-invariant. E.g. this considers a load
from a loop-invariant address predictable; we were unable to prove that it
doesn't alias any of the memory-writes in the loop, but it is likely to
read to same value multiple times.

This is a relatively crude heuristic, but it helps to prevent excessive
if-conversion in multiple workloads in practice.

Reviewed By: apostolakis

Differential Revision: https://reviews.llvm.org/D141639
2023-05-12 19:21:03 +01:00
Krzysztof Drewniak
0bc739a4ae [GlobalISel] Handle ptr size != index size in IRTranslator, CodeGenPrepare
While the original motivation for this patch (address space 7 on
AMDGPU) has been reworked and is not presently planned to reach IR
translation, the incorrect (by the spec) handling of index offset
width in IR translation and CodeGenPrepare is likely to trip someone
- possibly future AMD, since we have a p7:160:256:256:32 now, so we
convert to the other API now.

Reviewed By: aemerson, arsenm

Differential Revision: https://reviews.llvm.org/D143526
2023-05-12 16:21:01 +00:00
Craig Topper
a983ef2c17 [DAGCombiner][AArch64][VE] Teach BuildUDIV/SDIV to use 2x mul when mulh/mul_lohi are not available.
Correct the legality of i32 mul_lohi on AArch64.

Previously, AArch64 incorrectly reported i32 mul_lohi as Legal.
This allowed BuildUDIV/SDIV to use them. A later DAGCombiner would
replace them with MULHS/MULHU because only the high half was used.
This conversion does not check the legality of MULHS/MULHU under
the assumption that LegalizeDAG can turn it back into MUL_LOHI later.

After they are converted to MULHS/MULHU, DAGCombine ran and saw that
these operations aren't supported but an i64 MUL is. So they get
converted to that plus a shift. Without this, LegalizeDAG would
convert back MUL_LOHI and isel would fail to find a pattern.

This patch teaches BuildUDIV/SDIV to create the wide mul and shift
so that we can report the correct operation legality on AArch64. It
also enables div by constant folding for more cases on VE.

I don't know if VE wants this div by constant optimization or not. If they
don't want it, they can use the isIntDivCheap hook to disable it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D150333
2023-05-12 09:06:17 -07:00
Felipe de Azevedo Piovezan
2da29955fb [SelectionDAG][DebugInfo] Implement translation of entry_value vars
This commit implements SelectionDAG lowering of dbg.declare intrinsics targeting
swiftasync Arguments, by putting them in the MachineFunction's table of
variables whose location doesn't change throughout the function.

Depends on D149882

Differential Revision: https://reviews.llvm.org/D149883
2023-05-12 12:00:13 -04:00
Felipe de Azevedo Piovezan
3f6e4e5b6e [IRTranslator][DebugInfo] Implement translation of entry_value vars
This commit implements IRTranslator lowering of dbg.declare intrinsics targeting
swiftasync Arguments, by putting them in the MachineFunction's table of
variables whose location doesn't change throughout the function.

Depends on D149881

Differential Revision: https://reviews.llvm.org/D149882
2023-05-12 11:55:39 -04:00
Florian Hahn
d0718ff410
[ShrinkWrap] Conservatively treat MIs without memory operands.
As pointed out by @jpenix-quic in D149668 post-commit, machine instructions
without memory operands need to be treated conservatively.
2023-05-12 16:11:01 +01:00
Felipe de Azevedo Piovezan
ee75422ce1 [AsmPrinter] Use EntryValue object info to emit Dwarf
This patch consumes the EntryValueObjects in a MachineFunction's table, using
them to emit the appropriate debug information for these variables.

Depends on D149880

Differential Revision: https://reviews.llvm.org/D149881
2023-05-12 08:35:48 -04:00
Vitaly Buka
1326a5a3f9 [NFC][LiveDebugValues] Clang-format b135df08 2023-05-11 19:42:08 -07:00
Vitaly Buka
b135df0839 [LiveDebugValues] Temporarily initialize MLocTracker::CurBB
Looks like code assumes that it will be always set, but it's not true:
https://reviews.llvm.org/D150420. This is temporarily suppression to enabled
stricter msan on a bot.
2023-05-11 19:40:46 -07:00
Craig Topper
4a9e6c422f [SelectionDAG] Correct AddNodeIDCustom for MemIntrinsicSDNodes.
We were missing any support for ISD::INTRINSIC_W_CHAIN/INTRINSIC_VOID
used for memory operations.

For ISD::PREFETCH and target memory nodes we didn't add the subclass
data.

This patch handles all MemIntrinsicSDNode in one place and adds the
missing subclass data.

Note. Unlike load/stores we don't add the memory VT in AddNodeIDCustom or getMemIntrinsicNode. Not sure why.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D150387
2023-05-11 16:28:47 -07:00
Jonathon Penix
81657922c6 [ShrinkWrap] Allow shrinkwrapping past memory accesses to jump tables
This patch adds a check for whether the memory operand is known to be
a jump table and, if so, allows shrinkwrapping to continue. In the
case that we are looking at a jump table, I believe it is safe to
assume that the access will not be to the stack (but please correct me
if I am wrong here).

In the test attached, this is helpful in that we are able to generate
only one instruction for each non-default case in the original switch
statement.

Differential Revision: https://reviews.llvm.org/D149886
2023-05-11 11:33:18 -07:00
Rahman Lavaee
5ac48ef513 [Propeller] Use a bit-field struct for the metdata fields of BBEntry.
This patch encapsulates the encoding and decoding logic of basic block metadata into the Metadata struct, and also reduces the decoded size of `SHT_LLVM_BB_ADDR_MAP` section.

The patch would've looked more readable if we could use designated initializer, but that is a c++20 feature.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D148360
2023-05-11 11:21:26 -07:00
Felipe de Azevedo Piovezan
33b69b9756 [YamlMF] Serialize EntryValueObjects
This commit implements the serialization and deserialization of the Machine
Function's EntryValueObjects.

Depends on D149879, D149778

Differential Revision: https://reviews.llvm.org/D149880
2023-05-11 10:20:05 -04:00
sgokhale
1569b36ee9 [CodeGen][ShrinkWrap] Split restore point
Land D42600 with optimisation disabled by default by setting 'enable-shrink-wrap-region-split' option.

This is just to reduce effort involved in making changes to patch each time issue is detected and reland the whole patch.
2023-05-11 17:51:48 +05:30
Felipe de Azevedo Piovezan
3db7d0dffb [MachineFunction][DebugInfo][nfc] Introduce EntryValue variable kind
MachineFunction keeps a table of variables whose addresses never change
throughout the function. Today, the only kinds of locations it can
handle are stack slots.

However, we could expand this for variables whose address is derived
from the value a register had upon function entry. One case where this
happens is with variables alive across coroutine funclets: these can
be placed in a coroutine frame object whose pointer is placed in a
register that is an argument to coroutine funclets.

```
define @foo(ptr %frame_ptr) {
  dbg.declare(%frame_ptr, !some_var,
              !DIExpression(EntryValue, <ptr_arithmetic>))
```

This is a patch in a series that aims to improve the debug information
generated by the CoroSplit pass in the context of `swiftasync`
arguments. Variables stored in the coroutine frame _must_ be described
the entry_value of the ABI-defined register containing a pointer to the
coroutine frame. Since these variables have a single location throughout
their lifetime, they are candidates for being stored in the
MachineFunction table.

Differential Revision: https://reviews.llvm.org/D149879
2023-05-11 07:29:57 -04:00
Matthias Braun
b8817825b9 Support critical edge splitting for jump tables
Add support for splitting critical edges coming from an indirect jump
using a jump table ("switch jumps").

This introduces the `TargetInstrInfo::getJumpTableIndex` callback to
allows targets to return an index into `MachineJumpTableInfo` for a
given indirect jump. It also updates to
`MachineBasicBlock::SplitCriticalEdge` to allow splitting of critical
edges by rewriting jump table entries.

This is largely based on work done by Zhixuan Huan in D132202.

Differential Revision: https://reviews.llvm.org/D140975
2023-05-10 20:30:52 -07:00
Hongtao Yu
958a3d8e2d [FS-AFDO] Do not load non-FS profile in MIR loader.
I was seeing a regression when enabling FS discriminators on an non-FS CSSPGO build. This is because a probe can get a zero-valued discriminator at a specific pass and that could lead to accidentally loading the corresponding base counter in the non-FS profile, while a non-zeo discriminator would end up getting zero samples. This could in turn undo the sample distribution effort done by previous BFI maintenance work and the probe distribution factor work for pseudo probes specifically. To mitigate that I'm disabling loading a non-FS profile against FS discriminators. The problem should also exist with non-CS AutoFDO, so I'm doing this for it too.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D149597
2023-05-10 16:38:49 -07:00
Hongtao Yu
b7d9322b49 [FS-AFDO] Load pseudo probe profile on MIR
This change enables loading pseudo-probe based profile on MIR. Different from the IR profile loader, callsites are excluded from MIR profile loading since they are not assinged a FS discriminator. Using zero as the discriminator is not accurate and would undo the distribution work done by the IR loader based on pseudo probe distribution factor. We reply on block probes only for FS profile loading.

Some refactoring is done to the IR profile loader so that `getProbeWeight` can be shared by both loaders.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D148584
2023-05-10 11:29:37 -07:00
Hongtao Yu
9849291dcc [PseudoProbe] Encode/Decode FS discriminator
Encoding FS discriminators for block probes. Decoding them correspondingly.

The encoding/decoding of FS discriminators are conditional, only for probes with a non-zero discriminator. This saves encoding size, also ensures downwards-compatiblity.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D147651
2023-05-10 11:27:54 -07:00
Mikael Holmen
6647f3cd01 [CombinerHelper] Fix gcc warning [NFC]
Without the fix gcc complains with
 ../lib/CodeGen/GlobalISel/CombinerHelper.cpp:1652:52: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
  1652 |          SrcDef->getOpcode() == TargetOpcode::G_OR && "Unexpected op");
       |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
2023-05-10 09:08:06 +02:00
Craig Topper
7c5209d017 [LegalizeTypes] Simplify code for UndefinedBooleanContent in PromoteIntOp_VECREDUCE.
We can treat UndefinedBooleanContent the same as ZeroOrOneBooleanContent.
There's no reason to consider sign extending.
2023-05-09 23:16:27 -07:00
Craig Topper
c582146a49 [LegalizeTypes] Use ISD::isTrueWhenEqual to simplify code. NFC 2023-05-09 22:49:22 -07:00
Chen Zheng
fb45493562 [DebugLine] save one debug line entry for empty prologue
Reland D147506 after fixing the failure in bot
https://lab.llvm.org/buildbot/#/builders/247/builds/4125

Some debuggers like DBX on AIX assume the address in debug line
entries is always incremental. But clang generates two entries (entry
for file scope line and entry for prologue end) with same address if
prologue is empty

And if the prologue is empty, seems the first debug line entry for the
function is unnecessary(i.e. removing the first entry won't impact the
behavior in GDB on Linux), so I implement this for all debuggers.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D147506
2023-05-10 01:21:02 +00:00
Sami Tolvanen
9e869efc1b [KCFI] Expand the KCFI term in comments (NFC) 2023-05-09 23:17:33 +00:00
Fangrui Song
bcaa0b26aa PrologEpilogInserter: Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds 2023-05-09 13:23:58 -07:00
Aaron Ballman
945f6e65be Wrap debug code with the LLVM_DEBUG macro; NFC
While investigating a bug in Clang, I noticed that -Wframe-larger-than
was emitting extra debug information along with the diagnostic. It
turns out that 2e1e2f52f357768186ecfcc5ac53d5fa53d1b094 fixed an issue
with the diagnostic, but accidentally left in some debug code that was
exposed in all builds.

So now we no longer emit things like:
8/4294967304 (0.00%) spills, 4294967296/4294967304 (100.00%) variables
along with the diagnostic
2023-05-09 15:53:24 -04:00
Sami Tolvanen
e9569748de [CodeGen][KCFI] Move cfi-type lowering to TargetLowering
KCFI machine function passes transform indirect calls with a
cfi-type attribute into architecture-specific type checks bundled
together with the calls. Instead of having a separate pass for each
architecture, add a generic machine function pass for KCFI and
move the architecture-specific code that emits the actual check to
TargetLowering. This avoids unnecessary duplication and makes it
easier to add KCFI support to other architectures.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D149915
2023-05-09 18:38:54 +00:00
Zain Jaffal
5d3a884229 [IRGen] Change annotation metadata to support inserting tuple of strings into annotation metadata array.
Annotation metadata supports adding singular annotation strings to annotation block. This patch adds the ability to insert a tuple of strings into the metadata array.

The idea here is that each tuple of strings represents a piece of information that can be all related. It makes it easier to parse through related metadata information given it will be contained in one tuple.
For example in remarks any pass that implements annotation remarks can have different type of remarks and pass additional information for each.

The original behaviour of annotation remarks is preserved here and we can mix tuple annotations and single annotations for the same instruction.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D148328
2023-05-09 17:51:28 +03:00
Amara Emerson
e1472db58e [GlobalISel] Implement commuting shl (add/or x, c1), c2 -> add/or (shl x, c2), c1 << c2
There's a target hook that's called in DAGCombiner that we stub here, I'll
implement the equivalent override for AArch64 in a subsequent patch since it's
used by different shift combine.

This change by itself has minor code size improvements on arm64 -Os CTMark:
Program                                       size.__text
                                              outputg181ppyy output8av1cxfn diff
consumer-typeset/consumer-typeset             410648.00      410648.00       0.0%
tramp3d-v4/tramp3d-v4                         364176.00      364176.00       0.0%
kimwitu++/kc                                  449216.00      449212.00      -0.0%
7zip/7zip-benchmark                           576128.00      576120.00      -0.0%
sqlite3/sqlite3                               285108.00      285100.00      -0.0%
SPASS/SPASS                                   411720.00      411688.00      -0.0%
ClamAV/clamscan                               379868.00      379764.00      -0.0%
Bullet/bullet                                 452064.00      451928.00      -0.0%
mafft/pairlocalalign                          246184.00      246108.00      -0.0%
lencod/lencod                                 428524.00      428152.00      -0.1%
                           Geomean difference                               -0.0%

Differential Revision: https://reviews.llvm.org/D150086
2023-05-08 22:37:43 -07:00
Pierre Calixte
971d982bd4 Do not optimize debug locations across section boundaries
Prevent optimization of DebugLoc across section boundaries, such optimization will yield incorrect source location if memory layout of sections does not strictly match the Asm file.

Reviewed By: #debug-info, dblaikie, MaskRay

Differential Revision: https://reviews.llvm.org/D149294
2023-05-09 00:12:59 +00:00
Alan Zhao
f4999d3535 Revert "[CodeGen][ShrinkWrap] Split restore point"
This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4.

The original commit causes a Chrome build assertion failure with
ThinLTO: https://crbug.com/1443635
2023-05-08 16:27:59 -07:00
Jonas Paulsson
10f0158f00 [MachineLateInstrsCleanup] Bugfix for handling of kill flags.
With cb57b7a7, the kill flags are now tracked during the forward search over
the instructions and the call to findRegisterUseOperandIdx() should therefore
only check for killing uses.

As shown with the failing test CodeGen/Hexagon/vector-sint-to-fp.ll, it could
otherwise be the case that an undef use after the instruction that killed the
register will be inserted into MBBKills, and the kill flag will not be
cleared.
2023-05-08 17:12:43 +02:00
Dhruv Chawla
1d21d2eb7f [TargetLowering] Fix unnecessary call to computeKnownBits (NFCI)
In the SimplifyDemandedBits function, there is a fallthrough to the
default case in the case of ISD::ADD, ISD::MUL and ISD::SUB. This
leads to a call to computeKnownBits which is unnecessary as the
calls to SimplifyDemandedBits in the cases themselves handle the
calculation of the known bits. This information is discarded through
the Known2 variables.

By keeping this information around and calling
KnownBits::mul or KnownBits::computeForAddSub directly, the
unnecessary computation can be avoided. For now, the NSW bit is not
passed through to KnownBits as this is something that
computeKnownBits does not handle either. This requires updating
computeForAddCarry to handle the flag as well.

Differential Revision: https://reviews.llvm.org/D150110
2023-05-08 16:14:01 +02:00
Akshay Khadse
5c7c3af1d0 Reapply [Coverity] Fix explicit null dereferences
This change fixes static code analysis errors

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D149506
2023-05-08 21:19:40 +08:00
sgokhale
1ddfd1c818 [CodeGen][ShrinkWrap] Split restore point
Try to reland D42600

Differential Revision: https://reviews.llvm.org/D42600
2023-05-08 13:21:07 +05:30
Jonas Paulsson
cb57b7a770 [MachineLateInstrsCleanup] Improve compile time for huge functions.
It was discovered that this pass could be slow on huge functions, meaning 20%
compile time instead of the usual ~0.5% (with a test case spending ~19 mins
just in the backend).

The problem related to the necessary clearing of earlier kill flags when a
redundant instruction is removed. With this patch, the handling of kill flags
is now done by maintaining a map instead of scanning backwards in the
function. This remedies the compile time on the huge file fully.

Reviewed By: vpykhtin, arsenm

Differential Revision: https://reviews.llvm.org/D147532

Resolves https://github.com/llvm/llvm-project/issues/61397
2023-05-08 09:05:17 +02:00
David Green
b774f14841 [DAG] Calculate the number of sign bits for constant BUILD_VECTOR directly.
For constant BUILD_VECTORs the operands need to be legal types. This can mean
that when the number of sign bits is calculated it may look that the entire
constant and inefficiently produce less sign bits than it could. For example i8
vectors could use i32 elements, for which 0x000000ff would be incorrectly
limited to 1 sign bit as the original value has 24 sign bits. This makes it
look at the constant directly, truncated to the correct type for the element so
that it can correctly return 8.

Differential Revision: https://reviews.llvm.org/D149956
2023-05-07 22:31:10 +01:00
Simon Pilgrim
b7116ba8b0 [DAG] computeOverflowForUnsignedAdd - use ConstantRange::unsignedAddMayOverflow as fallback
Replaces the more specific uadd_ov case
2023-05-06 22:03:38 +01:00
Simon Pilgrim
b83aa8bc75 [DAG] computeOverflowForUnsignedAdd - use getMaxValue().ult(2) to detect 0/1 values. NFCI. 2023-05-06 19:46:34 +01:00
Simon Pilgrim
8f82d8ee76 [DAG] visitSUBSAT - fold subsat(x,y) -> sub(x,y) if it never overflows 2023-05-06 15:55:04 +01:00