Commit 6c0665e22174d474050e85ca367424f6e02476be
(https://reviews.llvm.org/D45164) enabled certain constant expression
evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives,
see llvm/test/MC/AsmParser/assembler-expressions.s).
`getUseAssemblerInfoForParsing` was added to make `clang -c` handling
inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`),
where such expression folding (related to
`AttemptToFoldSymbolOffsetDifference`) is unavailable.
I believe this is overly conservative. We can make some parse-time
expression folding work for `clang -c` even if `clang -S` would still
report an error, a MCAsmStreamer issue (we cannot print `.if`
directives) that should not restrict the functionality of
MCObjectStreamer.
```
% cat b.cc
asm(R"(
.pushsection .text,"ax"
.globl _start; _start: ret
.if . -_start == 1
ret
.endif
.popsection
)");
% gcc -S b.cc && gcc -c b.cc
% clang -S -fno-integrated-as b.cc # succeeded
% clang -c b.cc # succeeded with this patch
% clang -S b.cc # still failed
<inline asm>:4:5: error: expected absolute expression
4 | .if . -_start == 1
| ^
1 error generated.
```
Close#62520
Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841
Pull Request: https://github.com/llvm/llvm-project/pull/91082
Prior to this, fixed point multiplication would lead to this assertion
error on AArhc64, armv8, and armv7.
```
_Accum f(_Accum x, _Accum y) { return x * y; }
// ./bin/clang++ -ffixed-point /tmp/test2.cc -c -S -o - -target aarch64 -O3
clang++: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:10245: void llvm::TargetLowering::forceExpandWideMUL(SelectionDAG &, const SDLoc &, bool, EVT, const SDValue, const SDValue, const SDValue, const SDValue, SDValue &, SDValue &) const: Assertion `Ret.getOpcode() == ISD::MERGE_VALUES && "Ret value is a collection of constituent nodes holding result."' failed.
```
This path into forceExpandWideMUL should only be taken if we don't
support [US]MUL_LOHI or MULH[US] for the operand size (32 in this case).
But we should also check if we can just leverage regular wide
multiplication. That is, extend the operands from 32 to 64, do a regular
64-bit mul, then trunc and shift. These ops are certainly available on
aarch64 but for wider types.
Frame indices are dense and consecutive, so use a vector instead of a
std::map. Due to possibly negative frame indices, use zig-zag encoding.
IndexedMap was not usable, as it attempted to copy the null value, which
is not possible with a std::unique_ptr.
This is just a minor performance improvement, but a low-hanging fruit.
I was benchmarking the MatchTable when I found that
`getConstantVRegValWithLookThrough` took a non-negligible amount of
time, about 7.5% of all of
`AArch64PreLegalizerCombinerImpl::tryCombineAll`.
I decided to take a closer look to see if I could squeeze some
performance out of it, and I landed on a few changes that:
- Avoid copying APint unnecessarily, especially returning
std::optional<APInt> can be expensive when a out parameter also works.
- Avoid indirect call by using templated function pointers instead of
function_ref/std::function
Both of those changes seem to speedup this function by about 50%, but my
benchmarking (`perf record`) seems inconsistent (so take measurements
with a grain of salt), I saw as high as 4.5% and as low as 2% for this
function on the exact same input after the changes, but it never got
close again to 7% in a few runs so this looks like a stable improvement.
The testing we have for vector ptradd was a bit lacking. In adding tests
this patch found a couple of issues mostly with the way v3 vectors of
ptrs were sometimes legalized via i64, and with non-i64 additions. It
does not attempt to fix the issue with mergevalues from returning vector
ptrs.
Based on discussion from
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788
Current interface is:
llvm.experimental.histogram(<vecty> ptrs, <intty> inc_amount, <vecty> mask)
The integer type used by 'inc_amount' needs to match the type of the buckets in memory.
The intrinsic covers the following operations:
* Gather load
* histogram on the elements of 'ptrs'
* multiply the histogram results by 'inc_amount'
* add the result of the multiply to the values loaded by the gather
* scatter store the results of the add
Supports lowering to histcnt instructions for AArch64 targets, and scalarization for all others at present.
Before this patch, the value of DW_AT_bit_offset, used for bitfields
before DWARF version 4, was always emitted as an unsigned integer using
the form DW_FORM_data<n>. If the value was originally a signed integer,
for instance in the case of negative offsets, it was up to debug
information consumers to re-cast it to a signed integer.
This is problematic since the burden of deciding if the value should be
read as signed or unsigned was put onto the debug info consumers: the
DWARF specification doesn't define DW_AT_bit_offset's underlying type.
If a debugger decided to interpret this attribute in the form data<n> as
unsigned, then negative offsets would be completely broken.
The DWARF specification version 3 mentions in the Data Representation
section, page 127:
> If one of the DW_FORM_data<n> forms is used to represent a signed or
unsigned integer, it can be hard for a consumer to discover the context
necessary to determine which interpretation is intended. Producers are
therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for
signed and unsigned integers respectively, rather than DW_FORM_data<n>.
Therefore, the proposal is to use DW_FORM_sdata, which is explicitly
signed. This is an indication to consumers that the offset must be
parsed unambiguously as a signed integer.
Finally, gcc already uses DW_FORM_sdata for negative offsets, fixing the
potential ambiguity altogether.
This patch mimics gcc's behaviour by emitting negative values of
DW_AT_bit_offset using the DW_FORM_sdata form. This eliminates any
potential misinterpretation.
One could argue that all values should use DW_FORM_sdata, but for the
sake of parity with gcc, it is safe to restrict the change to negative
values.
If we are lowering a frem and the divisor is known to be an integer power-2, we
can use the formula 'frem = x - trunc(x / d) * d'. This avoids the more
expensive call to fmod. The results are identical as fmod so long as d is a
power-2 (so the mul does not round incorrectly), and the sign of the return is
either always positive or not important for zeroes (nsz).
Unfortunately Alive2 does not handle this well at the moment. I was using
exhaustive checking to test this:
(https://gist.github.com/davemgreen/6078015f30d3bacd1e9572f8db5d4b64).
I found this in cpythons implementation of float_pow. I currently added it as a
DAG combine for frem with power-2 fp constants.
reassociationCanBreakAddressingModePattern tries to prevent bad add
reassociations that would break adrressing mode patterns. This adds
support for vscale offset addressing modes, making sure we don't break
patterns that already exist. It does not optimize _to_ the correct
addressing modes yet, but prevents us from optimizating _away_ from
them.
In PR #88385 I've added support for auto-vectorisation of some early
exit loops, which requires using the experimental.cttz.elts to calculate
final indices in the early exit block. We need a more accurate cost
model for this intrinsic to better reflect the cost of work required in
the early exit block. I've tried to accurately represent the expansion
code for the intrinsic when the target does not have efficient lowering
for it. It's quite tricky to model because you need to first figure out
what types will actually be used in the expansion. The type used can
have a significant effect on the cost if you end up using illegal vector
types.
Tests added here:
Analysis/CostModel/AArch64/cttz_elts.ll
Analysis/CostModel/RISCV/cttz_elts.ll
This is useful when the inner add has multiple uses, and so cannot be
canonicalized by pushing the constants down through the mul. This patch
adds patterns for both `add(mul(add(A, CA), CM), CB)` and with an extra add
`add(add(mul(add(A, CA), CM), B) CB)` as the second can come up when
lowering geps.
Users of stackmaps and patchpoints need to add all pristine registers to
the
spill set, even so they don't need to be all preserved.
This fixes the liveness computation for stackmaps to include pristine
registers.
This fixes rdar://21228337.
Gas uses encoding DW_EH_PE_absptr for PIC, and gnu ld converts it to
DW_EH_PE_sdata4|DW_EH_PE_pcrel.
LLD doesn't have this workarounding, thus complains
```
relocation R_MIPS_32 cannot be used against local symbol; recompile with -fPIC
relocation R_MIPS_64 cannot be used against local symbol; recompile with -fPIC
```
So, let's generates asm/obj files with `DW_EH_PE_sdata4|DW_EH_PE_pcrel`
encoding. In fact, GNU ld supports such OBJs well.
For N64, maybe we should use sdata8, while GNU ld doesn't support it
well, and in fact sdata4 is enough now. So we just ignore the `Large`
for `MCObjectFileInfo::initELFMCObjectFileInfo`. Maybe we should switch
back to sdata8 once GNU LD supports it well.
Fixes: #58377.
This change is an implementation of #87367's investigation on supporting
IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
If you want an overarching view of how this will all connect see:
https://github.com/llvm/llvm-project/pull/90088
Changes:
- `llvm/docs/GlobalISel/GenericOpcode.rst` - Document the `G_FTAN`
opcode
- `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic
- `llvm/include/llvm/Support/TargetOpcodes.def` - Create a `G_FTAN`
Opcode handler
- `llvm/include/llvm/Target/GenericOpcodes.td` - Define the `G_FTAN`
Opcode
- `llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp` Map the tan intrinsic
to `G_FTAN` Opcode
- `llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp` - Map the
`G_FTAN` opcode to the GLSL 4.5 and openCL tan instructions.
- `llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp` - Define `G_FTAN` as a
legal spirv target opcode.
Previously we recursively looked through extends and truncates on both
SourceValue and WideVal.
SourceValue is the largest source found for each of the stores we are
combining. WideVal is the source for the current store.
Previously we could incorrectly look through a (zext (trunc X)) pair and
incorrectly believe X to be a good source.
I think we could also look through a zext on one store and a sext on
another store and arbitrarily pick one of the extends as the final
source.
With this patch we only look through one level of extend or truncate.
And we don't look through extends/truncs on both SourceValue and WideVal
at the same time.
This may lose some optimization cases, but keeps everything we had tests
for.
Fixes#90936.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
53 under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
This is the first step to eliminating shouldCastAtomicRMWIInIR. This and
the other atomic expand casting hooks should be removed. This adds
duplicate legalization machinery and interfaces. This is already what
codegen is supposed to do, and already does for the promotion case.
In the case of atomicrmw xchg, there seems to be some benefit to having
the bitcasts moved outside of the cmpxchg loop on targets with separate
int and FP registers, which we should be able to deal with by directly
checking for the legality of the underlying operation.
The casting path was also losing metadata when it recreated the
instruction.
When set, the compiler will use separate unique sections for global
symbols in named special sections (e.g. symbols that are annotated with
__attribute__((section(...)))). Doing so enables linker GC to collect
unused symbols without having to use a different section per-symbol.
Vectorized ADDRSPACECASTs were not supported by the type legalizer.
This patch adds the support for:
- splitting the vector result: <2 x ptr> => 2 x <1 x ptr>
- scalarization: <1 x ptr> => ptr
- widening: <3 x ptr> => <4 x ptr>
This is all exercised by the added NVPTX tests.
Reverts llvm/llvm-project#90982
NIter was only declared in !NDEBUG, and only used for assertions - so it
was correct that it was incremented inside the assertion. (& in fact now
the non-asserts build fails, because the variable is incremented even
though it isn't declared)
If the shuffle mask contains no undef elements, then we can move the freeze through a shuffle node.
This requires special case handling to create a new ShuffleVectorSDNode.
Includes VECTOR_SHUFFLE support for isGuaranteedNotToBeUndefOrPoison / canCreateUndefOrPoison.
When looking through a right shift, we need to make sure that all of
the bits we are using from the shift come from the shift input and
not the sign or zero bits that are shifted in.
Fixes#90936.
This is the mirror to the recent atomic load change. The same
bitcast-back-to-integer case is a small code quality regression for the
same reason. This would disappear with a bitcastable legal 128-bit type.
In case of functions without a stack frame no "stack" field is
serialized into MIR which leads to isCalleeSavedInfoValid being false
when reading a MIR file back in. To fix this we should serialize
MachineFrameInfo::isCalleeSavedInfoValid() into MIR.
If `LLVM_APPEND_VC_REV` is on, add the git revision to the `.file`
string. The revision can be set with `LLVM_FORCE_VC_REVISION`.
Before:
`.file "git_revision.cpp",,"LLVM version 19.0.0git"`
After:
`.file "git_revision.cpp",,"LLVM version 19.0.0git (LLVM_REVISION)"`
If this flag is set, Xor will not be considered AddLike. If an Xor were
treated as an Add it may wrap. If we can prove there would be no carry out and
thus no wrap, the Xor would be turned into a disjoint Or by DAGCombine.
Use this new flag to fix a bug in X86 where an Xor is incorrectly being treated
as an NUWAdd.
Fixes#90668.