llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-02 08:56:07 +00:00

Author	SHA1	Message	Date
Fangrui Song	03c53c69a3	[MC] Remove UseAssemblerInfoForParsing Commit 6c0665e22174d474050e85ca367424f6e02476be (https://reviews.llvm.org/D45164) enabled certain constant expression evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives, see llvm/test/MC/AsmParser/assembler-expressions.s). `getUseAssemblerInfoForParsing` was added to make `clang -c` handling inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`), where such expression folding (related to `AttemptToFoldSymbolOffsetDifference`) is unavailable. I believe this is overly conservative. We can make some parse-time expression folding work for `clang -c` even if `clang -S` would still report an error, a MCAsmStreamer issue (we cannot print `.if` directives) that should not restrict the functionality of MCObjectStreamer. ``` % cat b.cc asm(R"( .pushsection .text,"ax" .globl _start; _start: ret .if . -_start == 1 ret .endif .popsection )"); % gcc -S b.cc && gcc -c b.cc % clang -S -fno-integrated-as b.cc # succeeded % clang -c b.cc # succeeded with this patch % clang -S b.cc # still failed <inline asm>:4:5: error: expected absolute expression 4 \| .if . -_start == 1 \| ^ 1 error generated. ``` Close #62520 Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841 Pull Request: https://github.com/llvm/llvm-project/pull/91082	2024-05-15 09:18:39 -07:00
Jay Foad	1650f1b3d7	Fix typo "indicies" (#92232 )	2024-05-15 13:10:16 +01:00
PiJoules	19008d3218	[llvm] Support fixed point multiplication on AArch64 (#84237 ) Prior to this, fixed point multiplication would lead to this assertion error on AArhc64, armv8, and armv7. ``` _Accum f(_Accum x, _Accum y) { return x * y; } // ./bin/clang++ -ffixed-point /tmp/test2.cc -c -S -o - -target aarch64 -O3 clang++: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:10245: void llvm::TargetLowering::forceExpandWideMUL(SelectionDAG &, const SDLoc &, bool, EVT, const SDValue, const SDValue, const SDValue, const SDValue, SDValue &, SDValue &) const: Assertion `Ret.getOpcode() == ISD::MERGE_VALUES && "Ret value is a collection of constituent nodes holding result."' failed. ``` This path into forceExpandWideMUL should only be taken if we don't support [US]MUL_LOHI or MULH[US] for the operand size (32 in this case). But we should also check if we can just leverage regular wide multiplication. That is, extend the operands from 32 to 64, do a regular 64-bit mul, then trunc and shift. These ops are certainly available on aarch64 but for wider types.	2024-05-14 11:23:45 -07:00
aengelke	e6d3a4212d	[CodeGen] Use SmallVector for FixedStackPSVs (#91760 ) Frame indices are dense and consecutive, so use a vector instead of a std::map. Due to possibly negative frame indices, use zig-zag encoding. IndexedMap was not usable, as it attempted to copy the null value, which is not possible with a std::unique_ptr. This is just a minor performance improvement, but a low-hanging fruit.	2024-05-14 13:13:24 +02:00
Pierre van Houtryve	922fafaff8	[GlobalISel] Micro-optimize getConstantVRegValWithLookThrough (#91969 ) I was benchmarking the MatchTable when I found that `getConstantVRegValWithLookThrough` took a non-negligible amount of time, about 7.5% of all of `AArch64PreLegalizerCombinerImpl::tryCombineAll`. I decided to take a closer look to see if I could squeeze some performance out of it, and I landed on a few changes that: - Avoid copying APint unnecessarily, especially returning std::optional<APInt> can be expensive when a out parameter also works. - Avoid indirect call by using templated function pointers instead of function_ref/std::function Both of those changes seem to speedup this function by about 50%, but my benchmarking (`perf record`) seems inconsistent (so take measurements with a grain of salt), I saw as high as 4.5% and as low as 2% for this function on the exact same input after the changes, but it never got close again to 7% in a few runs so this looks like a stable improvement.	2024-05-14 08:26:00 +02:00
David Green	34de2151e2	[AArch64][GlobalISel] Improve legalization of G_PTR_ADD (#91763 ) The testing we have for vector ptradd was a bit lacking. In adding tests this patch found a couple of issues mostly with the way v3 vectors of ptrs were sometimes legalized via i64, and with non-i64 additions. It does not attempt to fix the issue with mergevalues from returning vector ptrs.	2024-05-13 21:58:41 +01:00
Simon Pilgrim	061db17a30	Fix MSVC "signed/unsigned mismatch" warning. NFC.	2024-05-13 13:40:27 +01:00
Pierre van Houtryve	4ff45eee36	[GlobalISel][KnownBits] Simplify G_CONSTANT handling (#91946 ) We called getIConstantVRegVal which again queried MRI to get the VReg def. We already have the def, so just get the CImm directly. It can't fail.	2024-05-13 13:59:24 +02:00
Graham Hunter	fbb37e9606	[AArch64] Add an all-in-one histogram intrinsic Based on discussion from https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788 Current interface is: llvm.experimental.histogram(<vecty> ptrs, <intty> inc_amount, <vecty> mask) The integer type used by 'inc_amount' needs to match the type of the buckets in memory. The intrinsic covers the following operations: * Gather load * histogram on the elements of 'ptrs' * multiply the histogram results by 'inc_amount' * add the result of the multiply to the values loaded by the gather * scatter store the results of the add Supports lowering to histcnt instructions for AArch64 targets, and scalarization for all others at present.	2024-05-13 11:35:28 +01:00
Victor Campos	119aecb955	[DebugInfo] Emit negative DW_AT_bit_offset in explicit signed form (#87994 ) Before this patch, the value of DW_AT_bit_offset, used for bitfields before DWARF version 4, was always emitted as an unsigned integer using the form DW_FORM_data<n>. If the value was originally a signed integer, for instance in the case of negative offsets, it was up to debug information consumers to re-cast it to a signed integer. This is problematic since the burden of deciding if the value should be read as signed or unsigned was put onto the debug info consumers: the DWARF specification doesn't define DW_AT_bit_offset's underlying type. If a debugger decided to interpret this attribute in the form data<n> as unsigned, then negative offsets would be completely broken. The DWARF specification version 3 mentions in the Data Representation section, page 127: > If one of the DW_FORM_data<n> forms is used to represent a signed or unsigned integer, it can be hard for a consumer to discover the context necessary to determine which interpretation is intended. Producers are therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for signed and unsigned integers respectively, rather than DW_FORM_data<n>. Therefore, the proposal is to use DW_FORM_sdata, which is explicitly signed. This is an indication to consumers that the offset must be parsed unambiguously as a signed integer. Finally, gcc already uses DW_FORM_sdata for negative offsets, fixing the potential ambiguity altogether. This patch mimics gcc's behaviour by emitting negative values of DW_AT_bit_offset using the DW_FORM_sdata form. This eliminates any potential misinterpretation. One could argue that all values should use DW_FORM_sdata, but for the sake of parity with gcc, it is safe to restrict the change to negative values.	2024-05-13 11:14:35 +01:00
Min-Yih Hsu	f8063ffe73	[VP][RISCV] Add vp.reduce.fmaximum/fminimum and its RISC-V codegen (#91782 ) `vp.reduce.fmaximum/fminimum` are the VP version of `vector.reduce.fmaximum/fminimum`.	2024-05-10 16:01:47 -07:00
Simon Pilgrim	7f3e3785d0	Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFC.	2024-05-10 22:40:23 +01:00
Simon Pilgrim	7e6879b245	[X86] scalarizeExtractedBinop - reuse existing SDLoc. NFC.	2024-05-10 22:40:23 +01:00
David Green	8fc9e3d577	[DAG] Lower frem of power-2 using div/trunc/mul+sub (#91148 ) If we are lowering a frem and the divisor is known to be an integer power-2, we can use the formula 'frem = x - trunc(x / d) * d'. This avoids the more expensive call to fmod. The results are identical as fmod so long as d is a power-2 (so the mul does not round incorrectly), and the sign of the return is either always positive or not important for zeroes (nsz). Unfortunately Alive2 does not handle this well at the moment. I was using exhaustive checking to test this: (https://gist.github.com/davemgreen/6078015f30d3bacd1e9572f8db5d4b64). I found this in cpythons implementation of float_pow. I currently added it as a DAG combine for frem with power-2 fp constants.	2024-05-10 14:58:48 +01:00
Paul Walker	b277bf56d7	[LLVM][CodeGen][SVE] Clean up lowering of VECTOR_SPLICE operations. (#91330 ) Remove DAG combine that is performing type legalisation and instead add isel patterns for all legal types.	2024-05-10 10:53:57 +01:00
David Green	23b673e5b4	[DAG][AArch64] Handle vscale addressing modes in reassociationCanBreakAddressingModePattern (#89908 ) reassociationCanBreakAddressingModePattern tries to prevent bad add reassociations that would break adrressing mode patterns. This adds support for vscale offset addressing modes, making sure we don't break patterns that already exist. It does not optimize _to_ the correct addressing modes yet, but prevents us from optimizating _away_ from them.	2024-05-10 09:27:02 +01:00
Simon Pilgrim	df21ee4c62	[DAG] Add clang-format off/on wrappers around compact switch handlers. NFC. Avoids a problem identified in #90503	2024-05-09 17:02:17 +01:00
Matt Arsenault	6a8d30b1c1	DAG: Skip 0 sign handling in minimum/maximum lowering for _ieee case (#91326 ) dc9664a8adae17f2083fbcc8e96cfce606c56d57 changed the documentation to assume these order -0 as less than +0.	2024-05-09 14:41:13 +02:00
David Sherwood	b52fa9461a	[Analysis] Add cost model for experimental.cttz.elts intrinsic (#90720 ) In PR #88385 I've added support for auto-vectorisation of some early exit loops, which requires using the experimental.cttz.elts to calculate final indices in the early exit block. We need a more accurate cost model for this intrinsic to better reflect the cost of work required in the early exit block. I've tried to accurately represent the expansion code for the intrinsic when the target does not have efficient lowering for it. It's quite tricky to model because you need to first figure out what types will actually be used in the expansion. The type used can have a significant effect on the cost if you end up using illegal vector types. Tests added here: Analysis/CostModel/AArch64/cttz_elts.ll Analysis/CostModel/RISCV/cttz_elts.ll	2024-05-09 09:40:33 +01:00
David Green	fcf945f4ed	[DAG] Fold add(mul(add(A, CA), CM), CB) -> add(mul(A, CM), CM*CA+CB) (#90860 ) This is useful when the inner add has multiple uses, and so cannot be canonicalized by pushing the constants down through the mul. This patch adds patterns for both `add(mul(add(A, CA), CM), CB)` and with an extra add `add(add(mul(add(A, CA), CM), B) CB)` as the second can come up when lowering geps.	2024-05-08 22:11:18 +01:00
Juergen Ributzka	29b7eb8400	[llvm][stackmaps] Include pristine registers for liveness computation. (#90529 ) Users of stackmaps and patchpoints need to add all pristine registers to the spill set, even so they don't need to be all preserved. This fixes the liveness computation for stackmaps to include pristine registers. This fixes rdar://21228337.	2024-05-08 08:58:55 -07:00
Thorsten Schütt	737e0bcfe3	[GlobalIsel] combine ext of trunc with flags (#87115 ) https://github.com/llvm/llvm-project/pull/85592 https://discourse.llvm.org/t/rfc-add-nowrap-flags-to-trunc/77453 https://github.com/llvm/llvm-project/pull/88609	2024-05-08 14:27:02 +02:00
Aleksandr Popov	df311a2762	Add interface to check if a call has a deopt bundle (NFC) (#91348 ) Encapsulate check that a call has a deopt bundle to make it easier to change the deopt scheme.	2024-05-08 11:54:49 +02:00
YunQiang Su	8f21294897	MIPS: Use pcrel\|sdata4 for eh_frame (#91291 ) Gas uses encoding DW_EH_PE_absptr for PIC, and gnu ld converts it to DW_EH_PE_sdata4\|DW_EH_PE_pcrel. LLD doesn't have this workarounding, thus complains ``` relocation R_MIPS_32 cannot be used against local symbol; recompile with -fPIC relocation R_MIPS_64 cannot be used against local symbol; recompile with -fPIC ``` So, let's generates asm/obj files with `DW_EH_PE_sdata4\|DW_EH_PE_pcrel` encoding. In fact, GNU ld supports such OBJs well. For N64, maybe we should use sdata8, while GNU ld doesn't support it well, and in fact sdata4 is enough now. So we just ignore the `Large` for `MCObjectFileInfo::initELFMCObjectFileInfo`. Maybe we should switch back to sdata8 once GNU LD supports it well. Fixes: #58377.	2024-05-08 17:30:14 +08:00
Farzon Lotfi	3e82442ff7	[SPIRV] Add tan intrinsic part 3 (#90278 ) This change is an implementation of #87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 If you want an overarching view of how this will all connect see: https://github.com/llvm/llvm-project/pull/90088 Changes: - `llvm/docs/GlobalISel/GenericOpcode.rst` - Document the `G_FTAN` opcode - `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic - `llvm/include/llvm/Support/TargetOpcodes.def` - Create a `G_FTAN` Opcode handler - `llvm/include/llvm/Target/GenericOpcodes.td` - Define the `G_FTAN` Opcode - `llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp` Map the tan intrinsic to `G_FTAN` Opcode - `llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp` - Map the `G_FTAN` opcode to the GLSL 4.5 and openCL tan instructions. - `llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp` - Define `G_FTAN` as a legal spirv target opcode.	2024-05-08 00:57:39 -04:00
Craig Topper	ef84452571	[DAGCombiner] Be more careful about looking through extends and truncates in mergeTruncStores. (#91375 ) Previously we recursively looked through extends and truncates on both SourceValue and WideVal. SourceValue is the largest source found for each of the stores we are combining. WideVal is the source for the current store. Previously we could incorrectly look through a (zext (trunc X)) pair and incorrectly believe X to be a good source. I think we could also look through a zext on one store and a sext on another store and arbitrarily pick one of the extends as the final source. With this patch we only look through one level of extend or truncate. And we don't look through extends/truncs on both SourceValue and WideVal at the same time. This may lose some optimization cases, but keeps everything we had tests for. Fixes #90936.	2024-05-07 21:17:50 -07:00
Jinsong Ji	2dade0041a	[Analysis] Attribute Range should not prevent tail call optimization (#91122 ) - Remove Range attr when comparing for tailcall - Add test for testcall with range	2024-05-07 22:02:10 -04:00
Matt Arsenault	82bb2534d4	AMDGPU: Don't bitcast float typed atomic store in IR (#90116 ) Implement the promotion in the DAG. Depends #90113	2024-05-07 21:43:22 +02:00
Kazu Hirata	026a29e8b3	[Analysis, CodeGen, DebugInfo] Use StringRef::operator== instead of StringRef::equals (NFC) (#91304 ) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 53 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".	2024-05-07 10:20:10 -07:00
Matt Arsenault	7927bcdb8a	AMDGPU: Do not bitcast atomicrmw in IR (#90045 ) This is the first step to eliminating shouldCastAtomicRMWIInIR. This and the other atomic expand casting hooks should be removed. This adds duplicate legalization machinery and interfaces. This is already what codegen is supposed to do, and already does for the promotion case. In the case of atomicrmw xchg, there seems to be some benefit to having the bitcasts moved outside of the cmpxchg loop on targets with separate int and FP registers, which we should be able to deal with by directly checking for the legality of the underlying operation. The casting path was also losing metadata when it recreated the instruction.	2024-05-07 18:26:32 +02:00
Petr Hosek	8bcb073705	[Clang] -fseparate-named-sections option (#91028 ) When set, the compiler will use separate unique sections for global symbols in named special sections (e.g. symbols that are annotated with __attribute__((section(...)))). Doing so enables linker GC to collect unused symbols without having to use a different section per-symbol.	2024-05-07 09:18:55 -07:00
Paul Walker	235cea720c	[NFC][LLVM] Refactor rounding mode detection of constrained fp intrinsic IDs (#90854 ) I've refactored the code to genericise the implementation to better allow for target specific constrained fp intrinsics.	2024-05-07 11:23:55 +01:00
Quentin Colombet	6ce04747cf	[SDISel] Teach the type legalizer about ADDRSPACECAST (#90969 ) Vectorized ADDRSPACECASTs were not supported by the type legalizer. This patch adds the support for: - splitting the vector result: <2 x ptr> => 2 x <1 x ptr> - scalarization: <1 x ptr> => ptr - widening: <3 x ptr> => <4 x ptr> This is all exercised by the added NVPTX tests.	2024-05-07 11:08:33 +02:00
Thorsten Schütt	b42f553af5	[GlobalIsel] Combine extract vector element (#90339 ) look through shuffle vectors	2024-05-07 07:12:58 +02:00
Simon Pilgrim	522b4bfe5b	[DAG] Fold bitreverse(shl/srl(bitreverse(x),y)) -> srl/shl(x,y) (#89897 ) Noticed while investigating GFNI per-element vector shifts (we can form SHL but not SRL/SRA) Alive2: https://alive2.llvm.org/ce/z/fSH-rf	2024-05-06 11:13:05 +01:00
Phoebe Wang	02dfbbff19	[SelectionDAG] Make ARITH_FENCE support half and bfloat type (#90836 )	2024-05-05 13:08:34 +08:00
David Blaikie	004485690e	Revert "llvm/lib/CodeGen/TargetSchedule.cpp:132:12: warning: Assert statement modifies 'NIter'" (#91079 ) Reverts llvm/llvm-project#90982 NIter was only declared in !NDEBUG, and only used for assertions - so it was correct that it was incremented inside the assertion. (& in fact now the non-asserts build fails, because the variable is incremented even though it isn't declared)	2024-05-04 11:43:08 -07:00
akshaykumars614	18d1df4633	llvm/lib/CodeGen/TargetSchedule.cpp:132:12: warning: Assert statement modifies 'NIter' (#90982 ) Modified the assert statement	2024-05-04 14:16:02 -04:00
Simon Pilgrim	caacf8685a	[DAG] Fold freeze(shuffle(x,y,m)) -> shuffle(freeze(x),freeze(y),m) (#90952 ) If the shuffle mask contains no undef elements, then we can move the freeze through a shuffle node. This requires special case handling to create a new ShuffleVectorSDNode. Includes VECTOR_SHUFFLE support for isGuaranteedNotToBeUndefOrPoison / canCreateUndefOrPoison.	2024-05-04 12:03:10 +01:00
Craig Topper	3563af6c06	[DAGCombiner] In mergeTruncStore, make sure we aren't storing shifted in bits. (#90939 ) When looking through a right shift, we need to make sure that all of the bits we are using from the shift come from the shift input and not the sign or zero bits that are shifted in. Fixes #90936.	2024-05-03 09:59:33 -07:00
Matt Arsenault	edbe6ebb4d	SystemZ: Don't promote atomic store in IR (#90899 ) This is the mirror to the recent atomic load change. The same bitcast-back-to-integer case is a small code quality regression for the same reason. This would disappear with a bitcastable legal 128-bit type.	2024-05-03 10:04:12 +02:00
Youngsuk Kim	9d4575c910	[llvm] Make lambda take const reference to prevent unneeded copy (NFC) Closes #89198	2024-05-02 15:34:03 -05:00
Matt Arsenault	b6d24cb018	DAG: Implement softening for fp atomic load (#90839 )	2024-05-02 13:38:37 +02:00
Matt Arsenault	d9fc5babb9	DAG: Implement softening for fp atomic store (#90840 ) This will prevent SystemZ test regressions in a future change, tested by #90826	2024-05-02 12:08:52 +02:00
zxc12523	171aeb20ad	[DAG] SelectionDAG.computeKnownBits - add NSW/NUW flags support to ISD::SHL handling (#89877 ) fix #89414	2024-05-02 10:31:56 +01:00
Nikita Popov	d484c4d350	[InterleavedLoadCombine] Bail out on non-byte-sized vector element type (#90705 ) Vectors are always tightly packed, and elements of non-byte-sized usually do not have a well-defined (byte) offset. Fixes https://github.com/llvm/llvm-project/issues/90695.	2024-05-02 09:38:09 +09:00
David Tellenbach	cf2f32c97f	[MIR] Serialize MachineFrameInfo::isCalleeSavedInfoValid() (#90561 ) In case of functions without a stack frame no "stack" field is serialized into MIR which leads to isCalleeSavedInfoValid being false when reading a MIR file back in. To fix this we should serialize MachineFrameInfo::isCalleeSavedInfoValid() into MIR.	2024-05-01 10:07:51 -07:00
Matt Arsenault	39e24bdd8e	MachineLICM: Allow hoisting REG_SEQUENCE (#90638 )	2024-05-01 16:52:04 +02:00
Jake Egan	8cde1cfc60	[AIX] Add git revision to .file string (#88164 ) If `LLVM_APPEND_VC_REV` is on, add the git revision to the `.file` string. The revision can be set with `LLVM_FORCE_VC_REVISION`. Before: `.file "git_revision.cpp",,"LLVM version 19.0.0git"` After: `.file "git_revision.cpp",,"LLVM version 19.0.0git (LLVM_REVISION)"`	2024-04-30 20:37:35 -04:00
Craig Topper	a03eeb0e98	[SelectionDAG][X86] Add a NoWrap flag to SelectionDAG::isAddLike. NFC (#90681 ) If this flag is set, Xor will not be considered AddLike. If an Xor were treated as an Add it may wrap. If we can prove there would be no carry out and thus no wrap, the Xor would be turned into a disjoint Or by DAGCombine. Use this new flag to fix a bug in X86 where an Xor is incorrectly being treated as an NUWAdd. Fixes #90668.	2024-04-30 16:52:56 -07:00

1 2 3 4 5 ...

35742 Commits