35742 Commits

Author SHA1 Message Date
Fangrui Song
03c53c69a3
[MC] Remove UseAssemblerInfoForParsing
Commit 6c0665e22174d474050e85ca367424f6e02476be
(https://reviews.llvm.org/D45164) enabled certain constant expression
evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives,
see llvm/test/MC/AsmParser/assembler-expressions.s).

`getUseAssemblerInfoForParsing` was added to make `clang -c` handling
inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`),
where such expression folding (related to
`AttemptToFoldSymbolOffsetDifference`) is unavailable.

I believe this is overly conservative. We can make some parse-time
expression folding work for `clang -c` even if `clang -S` would still
report an error, a MCAsmStreamer issue (we cannot print `.if`
directives) that should not restrict the functionality of
MCObjectStreamer.

```
% cat b.cc
asm(R"(
.pushsection .text,"ax"
.globl _start; _start: ret
.if . -_start == 1
  ret
.endif
.popsection
)");
% gcc -S b.cc && gcc -c b.cc
% clang -S -fno-integrated-as b.cc     # succeeded

% clang -c b.cc     # succeeded with this patch
% clang -S b.cc     # still failed
<inline asm>:4:5: error: expected absolute expression
    4 | .if . -_start == 1
      |     ^
1 error generated.
```

Close #62520
Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841

Pull Request: https://github.com/llvm/llvm-project/pull/91082
2024-05-15 09:18:39 -07:00
Jay Foad
1650f1b3d7
Fix typo "indicies" (#92232) 2024-05-15 13:10:16 +01:00
PiJoules
19008d3218
[llvm] Support fixed point multiplication on AArch64 (#84237)
Prior to this, fixed point multiplication would lead to this assertion
error on AArhc64, armv8, and armv7.

```
 _Accum f(_Accum x, _Accum y) { return x * y; }

// ./bin/clang++ -ffixed-point /tmp/test2.cc -c -S -o - -target aarch64 -O3
clang++: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:10245: void llvm::TargetLowering::forceExpandWideMUL(SelectionDAG &, const SDLoc &, bool, EVT, const SDValue, const SDValue, const SDValue, const SDValue, SDValue &, SDValue &) const: Assertion `Ret.getOpcode() == ISD::MERGE_VALUES && "Ret value is a collection of constituent nodes holding result."' failed.
```

This path into forceExpandWideMUL should only be taken if we don't
support [US]MUL_LOHI or MULH[US] for the operand size (32 in this case).
But we should also check if we can just leverage regular wide
multiplication. That is, extend the operands from 32 to 64, do a regular
64-bit mul, then trunc and shift. These ops are certainly available on
aarch64 but for wider types.
2024-05-14 11:23:45 -07:00
aengelke
e6d3a4212d
[CodeGen] Use SmallVector for FixedStackPSVs (#91760)
Frame indices are dense and consecutive, so use a vector instead of a
std::map. Due to possibly negative frame indices, use zig-zag encoding.
IndexedMap was not usable, as it attempted to copy the null value, which
is not possible with a std::unique_ptr.

This is just a minor performance improvement, but a low-hanging fruit.
2024-05-14 13:13:24 +02:00
Pierre van Houtryve
922fafaff8
[GlobalISel] Micro-optimize getConstantVRegValWithLookThrough (#91969)
I was benchmarking the MatchTable when I found that
`getConstantVRegValWithLookThrough` took a non-negligible amount of
time, about 7.5% of all of
`AArch64PreLegalizerCombinerImpl::tryCombineAll`.

I decided to take a closer look to see if I could squeeze some
performance out of it, and I landed on a few changes that:
- Avoid copying APint unnecessarily, especially returning
std::optional<APInt> can be expensive when a out parameter also works.
- Avoid indirect call by using templated function pointers instead of
function_ref/std::function

Both of those changes seem to speedup this function by about 50%, but my
benchmarking (`perf record`) seems inconsistent (so take measurements
with a grain of salt), I saw as high as 4.5% and as low as 2% for this
function on the exact same input after the changes, but it never got
close again to 7% in a few runs so this looks like a stable improvement.
2024-05-14 08:26:00 +02:00
David Green
34de2151e2
[AArch64][GlobalISel] Improve legalization of G_PTR_ADD (#91763)
The testing we have for vector ptradd was a bit lacking. In adding tests
this patch found a couple of issues mostly with the way v3 vectors of
ptrs were sometimes legalized via i64, and with non-i64 additions. It
does not attempt to fix the issue with mergevalues from returning vector
ptrs.
2024-05-13 21:58:41 +01:00
Simon Pilgrim
061db17a30 Fix MSVC "signed/unsigned mismatch" warning. NFC. 2024-05-13 13:40:27 +01:00
Pierre van Houtryve
4ff45eee36
[GlobalISel][KnownBits] Simplify G_CONSTANT handling (#91946)
We called getIConstantVRegVal which again queried MRI to get the VReg
def. We already have the def, so just get the CImm directly. It can't
fail.
2024-05-13 13:59:24 +02:00
Graham Hunter
fbb37e9606
[AArch64] Add an all-in-one histogram intrinsic
Based on discussion from
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788

Current interface is:

llvm.experimental.histogram(<vecty> ptrs, <intty> inc_amount, <vecty> mask)

The integer type used by 'inc_amount' needs to match the type of the buckets in memory.

The intrinsic covers the following operations:
  * Gather load
  * histogram on the elements of 'ptrs'
  * multiply the histogram results by 'inc_amount'
  * add the result of the multiply to the values loaded by the gather
  * scatter store the results of the add

Supports lowering to histcnt instructions for AArch64 targets, and scalarization for all others at present.
2024-05-13 11:35:28 +01:00
Victor Campos
119aecb955
[DebugInfo] Emit negative DW_AT_bit_offset in explicit signed form (#87994)
Before this patch, the value of DW_AT_bit_offset, used for bitfields
before DWARF version 4, was always emitted as an unsigned integer using
the form DW_FORM_data<n>. If the value was originally a signed integer,
for instance in the case of negative offsets, it was up to debug
information consumers to re-cast it to a signed integer.

This is problematic since the burden of deciding if the value should be
read as signed or unsigned was put onto the debug info consumers: the
DWARF specification doesn't define DW_AT_bit_offset's underlying type.
If a debugger decided to interpret this attribute in the form data<n> as
unsigned, then negative offsets would be completely broken.

The DWARF specification version 3 mentions in the Data Representation
section, page 127:

> If one of the DW_FORM_data<n> forms is used to represent a signed or
unsigned integer, it can be hard for a consumer to discover the context
necessary to determine which interpretation is intended. Producers are
therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for
signed and unsigned integers respectively, rather than DW_FORM_data<n>.

Therefore, the proposal is to use DW_FORM_sdata, which is explicitly
signed. This is an indication to consumers that the offset must be
parsed unambiguously as a signed integer.

Finally, gcc already uses DW_FORM_sdata for negative offsets, fixing the
potential ambiguity altogether.

This patch mimics gcc's behaviour by emitting negative values of
DW_AT_bit_offset using the DW_FORM_sdata form. This eliminates any
potential misinterpretation.

One could argue that all values should use DW_FORM_sdata, but for the
sake of parity with gcc, it is safe to restrict the change to negative
values.
2024-05-13 11:14:35 +01:00
Min-Yih Hsu
f8063ffe73
[VP][RISCV] Add vp.reduce.fmaximum/fminimum and its RISC-V codegen (#91782)
`vp.reduce.fmaximum/fminimum` are the VP version of
`vector.reduce.fmaximum/fminimum`.
2024-05-10 16:01:47 -07:00
Simon Pilgrim
7f3e3785d0 Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFC. 2024-05-10 22:40:23 +01:00
Simon Pilgrim
7e6879b245 [X86] scalarizeExtractedBinop - reuse existing SDLoc. NFC. 2024-05-10 22:40:23 +01:00
David Green
8fc9e3d577
[DAG] Lower frem of power-2 using div/trunc/mul+sub (#91148)
If we are lowering a frem and the divisor is known to be an integer power-2, we
can use the formula 'frem = x - trunc(x / d) * d'. This avoids the more
expensive call to fmod. The results are identical as fmod so long as d is a
power-2 (so the mul does not round incorrectly), and the sign of the return is
either always positive or not important for zeroes (nsz).

Unfortunately Alive2 does not handle this well at the moment. I was using
exhaustive checking to test this:
(https://gist.github.com/davemgreen/6078015f30d3bacd1e9572f8db5d4b64).

I found this in cpythons implementation of float_pow. I currently added it as a
DAG combine for frem with power-2 fp constants.
2024-05-10 14:58:48 +01:00
Paul Walker
b277bf56d7
[LLVM][CodeGen][SVE] Clean up lowering of VECTOR_SPLICE operations. (#91330)
Remove DAG combine that is performing type legalisation and instead
add isel patterns for all legal types.
2024-05-10 10:53:57 +01:00
David Green
23b673e5b4
[DAG][AArch64] Handle vscale addressing modes in reassociationCanBreakAddressingModePattern (#89908)
reassociationCanBreakAddressingModePattern tries to prevent bad add
reassociations that would break adrressing mode patterns. This adds
support for vscale offset addressing modes, making sure we don't break
patterns that already exist. It does not optimize _to_ the correct
addressing modes yet, but prevents us from optimizating _away_ from
them.
2024-05-10 09:27:02 +01:00
Simon Pilgrim
df21ee4c62 [DAG] Add clang-format off/on wrappers around compact switch handlers. NFC.
Avoids a problem identified in #90503
2024-05-09 17:02:17 +01:00
Matt Arsenault
6a8d30b1c1
DAG: Skip 0 sign handling in minimum/maximum lowering for _ieee case (#91326)
dc9664a8adae17f2083fbcc8e96cfce606c56d57 changed the documentation to
assume these order -0 as less than +0.
2024-05-09 14:41:13 +02:00
David Sherwood
b52fa9461a
[Analysis] Add cost model for experimental.cttz.elts intrinsic (#90720)
In PR #88385 I've added support for auto-vectorisation of some early
exit loops, which requires using the experimental.cttz.elts to calculate
final indices in the early exit block. We need a more accurate cost
model for this intrinsic to better reflect the cost of work required in
the early exit block. I've tried to accurately represent the expansion
code for the intrinsic when the target does not have efficient lowering
for it. It's quite tricky to model because you need to first figure out
what types will actually be used in the expansion. The type used can
have a significant effect on the cost if you end up using illegal vector
types.

Tests added here:

  Analysis/CostModel/AArch64/cttz_elts.ll
  Analysis/CostModel/RISCV/cttz_elts.ll
2024-05-09 09:40:33 +01:00
David Green
fcf945f4ed
[DAG] Fold add(mul(add(A, CA), CM), CB) -> add(mul(A, CM), CM*CA+CB) (#90860)
This is useful when the inner add has multiple uses, and so cannot be
canonicalized by pushing the constants down through the mul. This patch
adds patterns for both `add(mul(add(A, CA), CM), CB)` and with an extra add
`add(add(mul(add(A, CA), CM), B) CB)` as the second can come up when
lowering geps.
2024-05-08 22:11:18 +01:00
Juergen Ributzka
29b7eb8400
[llvm][stackmaps] Include pristine registers for liveness computation. (#90529)
Users of stackmaps and patchpoints need to add all pristine registers to
the
spill set, even so they don't need to be all preserved.

This fixes the liveness computation for stackmaps to include pristine
registers.

This fixes rdar://21228337.
2024-05-08 08:58:55 -07:00
Thorsten Schütt
737e0bcfe3
[GlobalIsel] combine ext of trunc with flags (#87115)
https://github.com/llvm/llvm-project/pull/85592

https://discourse.llvm.org/t/rfc-add-nowrap-flags-to-trunc/77453

https://github.com/llvm/llvm-project/pull/88609
2024-05-08 14:27:02 +02:00
Aleksandr Popov
df311a2762
Add interface to check if a call has a deopt bundle (NFC) (#91348)
Encapsulate check that a call has a deopt bundle to make it easier to
change the deopt scheme.
2024-05-08 11:54:49 +02:00
YunQiang Su
8f21294897
MIPS: Use pcrel|sdata4 for eh_frame (#91291)
Gas uses encoding DW_EH_PE_absptr for PIC, and gnu ld converts it to
DW_EH_PE_sdata4|DW_EH_PE_pcrel.
LLD doesn't have this workarounding, thus complains
```
  relocation R_MIPS_32 cannot be used against local symbol; recompile with -fPIC
  relocation R_MIPS_64 cannot be used against local symbol; recompile with -fPIC
```

So, let's generates asm/obj files with `DW_EH_PE_sdata4|DW_EH_PE_pcrel`
encoding. In fact, GNU ld supports such OBJs well.

For N64, maybe we should use sdata8, while GNU ld doesn't support it
well, and in fact sdata4 is enough now. So we just ignore the `Large`
for `MCObjectFileInfo::initELFMCObjectFileInfo`. Maybe we should switch
back to sdata8 once GNU LD supports it well.

Fixes: #58377.
2024-05-08 17:30:14 +08:00
Farzon Lotfi
3e82442ff7
[SPIRV] Add tan intrinsic part 3 (#90278)
This change is an implementation of #87367's investigation on supporting
IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

If you want an overarching view of how this will all connect see:
https://github.com/llvm/llvm-project/pull/90088
Changes:
- `llvm/docs/GlobalISel/GenericOpcode.rst` - Document the `G_FTAN`
opcode
-  `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic
- `llvm/include/llvm/Support/TargetOpcodes.def` - Create a `G_FTAN`
Opcode handler
- `llvm/include/llvm/Target/GenericOpcodes.td` - Define the `G_FTAN`
Opcode
- `llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp` Map the tan intrinsic
to `G_FTAN` Opcode
- `llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp` - Map the
`G_FTAN` opcode to the GLSL 4.5 and openCL tan instructions.
- `llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp` - Define `G_FTAN` as a
legal spirv target opcode.
2024-05-08 00:57:39 -04:00
Craig Topper
ef84452571
[DAGCombiner] Be more careful about looking through extends and truncates in mergeTruncStores. (#91375)
Previously we recursively looked through extends and truncates on both
SourceValue and WideVal.

SourceValue is the largest source found for each of the stores we are
combining. WideVal is the source for the current store.

Previously we could incorrectly look through a (zext (trunc X)) pair and
incorrectly believe X to be a good source.

I think we could also look through a zext on one store and a sext on
another store and arbitrarily pick one of the extends as the final
source.

With this patch we only look through one level of extend or truncate.
And we don't look through extends/truncs on both SourceValue and WideVal
at the same time.

This may lose some optimization cases, but keeps everything we had tests
for.

Fixes #90936.
2024-05-07 21:17:50 -07:00
Jinsong Ji
2dade0041a
[Analysis] Attribute Range should not prevent tail call optimization (#91122)
- Remove Range attr when comparing for tailcall
- Add test for testcall with range
2024-05-07 22:02:10 -04:00
Matt Arsenault
82bb2534d4
AMDGPU: Don't bitcast float typed atomic store in IR (#90116)
Implement the promotion in the DAG.

Depends #90113
2024-05-07 21:43:22 +02:00
Kazu Hirata
026a29e8b3
[Analysis, CodeGen, DebugInfo] Use StringRef::operator== instead of StringRef::equals (NFC) (#91304)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  53 under llvm/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-07 10:20:10 -07:00
Matt Arsenault
7927bcdb8a
AMDGPU: Do not bitcast atomicrmw in IR (#90045)
This is the first step to eliminating shouldCastAtomicRMWIInIR. This and
the other atomic expand casting hooks should be removed. This adds
duplicate legalization machinery and interfaces. This is already what
codegen is supposed to do, and already does for the promotion case.

In the case of atomicrmw xchg, there seems to be some benefit to having
the bitcasts moved outside of the cmpxchg loop on targets with separate
int and FP registers, which we should be able to deal with by directly
checking for the legality of the underlying operation.

The casting path was also losing metadata when it recreated the
instruction.
2024-05-07 18:26:32 +02:00
Petr Hosek
8bcb073705
[Clang] -fseparate-named-sections option (#91028)
When set, the compiler will use separate unique sections for global
symbols in named special sections (e.g. symbols that are annotated with
__attribute__((section(...)))). Doing so enables linker GC to collect
unused symbols without having to use a different section per-symbol.
2024-05-07 09:18:55 -07:00
Paul Walker
235cea720c
[NFC][LLVM] Refactor rounding mode detection of constrained fp intrinsic IDs (#90854)
I've refactored the code to genericise the implementation to better
allow for target specific constrained fp intrinsics.
2024-05-07 11:23:55 +01:00
Quentin Colombet
6ce04747cf
[SDISel] Teach the type legalizer about ADDRSPACECAST (#90969)
Vectorized ADDRSPACECASTs were not supported by the type legalizer.

This patch adds the support for:
- splitting the vector result: <2 x ptr> => 2 x <1 x ptr>
- scalarization: <1 x ptr> => ptr
- widening: <3 x ptr> => <4 x ptr>

This is all exercised by the added NVPTX tests.
2024-05-07 11:08:33 +02:00
Thorsten Schütt
b42f553af5
[GlobalIsel] Combine extract vector element (#90339)
look through shuffle vectors
2024-05-07 07:12:58 +02:00
Simon Pilgrim
522b4bfe5b
[DAG] Fold bitreverse(shl/srl(bitreverse(x),y)) -> srl/shl(x,y) (#89897)
Noticed while investigating GFNI per-element vector shifts (we can form SHL but not SRL/SRA)

Alive2: https://alive2.llvm.org/ce/z/fSH-rf
2024-05-06 11:13:05 +01:00
Phoebe Wang
02dfbbff19
[SelectionDAG] Make ARITH_FENCE support half and bfloat type (#90836) 2024-05-05 13:08:34 +08:00
David Blaikie
004485690e
Revert "llvm/lib/CodeGen/TargetSchedule.cpp:132:12: warning: Assert statement modifies 'NIter'" (#91079)
Reverts llvm/llvm-project#90982

NIter was only declared in !NDEBUG, and only used for assertions - so it
was correct that it was incremented inside the assertion. (& in fact now
the non-asserts build fails, because the variable is incremented even
though it isn't declared)
2024-05-04 11:43:08 -07:00
akshaykumars614
18d1df4633
llvm/lib/CodeGen/TargetSchedule.cpp:132:12: warning: Assert statement modifies 'NIter' (#90982)
Modified the assert statement
2024-05-04 14:16:02 -04:00
Simon Pilgrim
caacf8685a
[DAG] Fold freeze(shuffle(x,y,m)) -> shuffle(freeze(x),freeze(y),m) (#90952)
If the shuffle mask contains no undef elements, then we can move the freeze through a shuffle node.

This requires special case handling to create a new ShuffleVectorSDNode.

Includes VECTOR_SHUFFLE support for isGuaranteedNotToBeUndefOrPoison  / canCreateUndefOrPoison.
2024-05-04 12:03:10 +01:00
Craig Topper
3563af6c06
[DAGCombiner] In mergeTruncStore, make sure we aren't storing shifted in bits. (#90939)
When looking through a right shift, we need to make sure that all of
the bits we are using from the shift come from the shift input and
not the sign or zero bits that are shifted in.
    
Fixes #90936.
2024-05-03 09:59:33 -07:00
Matt Arsenault
edbe6ebb4d
SystemZ: Don't promote atomic store in IR (#90899)
This is the mirror to the recent atomic load change. The same
bitcast-back-to-integer case is a small code quality regression for the
same reason. This would disappear with a bitcastable legal 128-bit type.
2024-05-03 10:04:12 +02:00
Youngsuk Kim
9d4575c910 [llvm] Make lambda take const reference to prevent unneeded copy (NFC)
Closes #89198
2024-05-02 15:34:03 -05:00
Matt Arsenault
b6d24cb018
DAG: Implement softening for fp atomic load (#90839) 2024-05-02 13:38:37 +02:00
Matt Arsenault
d9fc5babb9
DAG: Implement softening for fp atomic store (#90840)
This will prevent SystemZ test regressions in a future change, tested by
#90826
2024-05-02 12:08:52 +02:00
zxc12523
171aeb20ad
[DAG] SelectionDAG.computeKnownBits - add NSW/NUW flags support to ISD::SHL handling (#89877)
fix #89414
2024-05-02 10:31:56 +01:00
Nikita Popov
d484c4d350
[InterleavedLoadCombine] Bail out on non-byte-sized vector element type (#90705)
Vectors are always tightly packed, and elements of non-byte-sized
usually do not have a well-defined (byte) offset.

Fixes https://github.com/llvm/llvm-project/issues/90695.
2024-05-02 09:38:09 +09:00
David Tellenbach
cf2f32c97f
[MIR] Serialize MachineFrameInfo::isCalleeSavedInfoValid() (#90561)
In case of functions without a stack frame no "stack" field is
serialized into MIR which leads to isCalleeSavedInfoValid being false
when reading a MIR file back in. To fix this we should serialize
MachineFrameInfo::isCalleeSavedInfoValid() into MIR.
2024-05-01 10:07:51 -07:00
Matt Arsenault
39e24bdd8e
MachineLICM: Allow hoisting REG_SEQUENCE (#90638) 2024-05-01 16:52:04 +02:00
Jake Egan
8cde1cfc60
[AIX] Add git revision to .file string (#88164)
If `LLVM_APPEND_VC_REV` is on, add the git revision to the `.file`
string. The revision can be set with `LLVM_FORCE_VC_REVISION`.

Before:
`.file	"git_revision.cpp",,"LLVM version 19.0.0git"`

After:
`.file	"git_revision.cpp",,"LLVM version 19.0.0git (LLVM_REVISION)"`
2024-04-30 20:37:35 -04:00
Craig Topper
a03eeb0e98
[SelectionDAG][X86] Add a NoWrap flag to SelectionDAG::isAddLike. NFC (#90681)
If this flag is set, Xor will not be considered AddLike. If an Xor were
treated as an Add it may wrap. If we can prove there would be no carry out and
thus no wrap, the Xor would be turned into a disjoint Or by DAGCombine.

Use this new flag to fix a bug in X86 where an Xor is incorrectly being treated
as an NUWAdd.

Fixes #90668.
2024-04-30 16:52:56 -07:00