stack save/restore emitted for character elemental function result
allocation inside hlfir.elemental in lowering created memory bugs
because result memory is actually still used after the stack restore
when lowering the elemental into a loop where the result element is
copied into the array result storage.
Instead of adding special handling for stack save/restore in lowering,
just avoid emitting those since the stack reclaim pass is able to emit
them in the generated loop. Not having those stack save/restore will
also help optimizations that want to elide the temporary allocation for
the element result when that is possible.
Rename `indent` to `Indent` and `o` to `OS`.
Rename `Indentation` to `Indent`.
Remove unused argument from `emitPredicateMatch`.
Change `Indent` argument to `emitBinaryParser` to by value.
Now that we have the C/C++ `__builtin_elementwise_popcount` intrinsic (#108121) - remove custom target intrinsics that just immediately map to Intrinsic::ctpop and use the generic intrinsic directly.
The current logic assumes that the import file is pulled by object
files, and the loop for import files only needs to handle cases where
the `__imp_` symbol is implicitly pulled by an import thunk. This is
fragile, as the symbol may also be pulled through other means, such as
the -export argument in tests. Additionally, this logic is insufficient
for ARM64EC, which exposes multiple symbols through an import file, and
referencing any one of them causes all of them to be defined.
With this change, import symbols are added to `syms` more often, but we
ensure that output symbols remain unique later in the process
PR #74625 introduced a regression in the code generated for the
following set of intrinsic:
vec_add_u128, vec_addc_u128, vec_adde_u128, vec_addec_u128
vec_sub_u128, vec_subc_u128, vec_sube_u128, vec_subec_u128
vec_sum_u128, vec_msum_u128
vec_gfmsum_128, vec_gfmsum_accum_128
This is because the new code incorrectly assumed that a cast
from "unsigned __int128" to "vector unsigned char" would simply
be a bitcast re-interpretation; instead, this cast actually
truncates the __int128 to char and splats the result.
Fixed by adding an intermediate cast via a single-element
128-bit integer vector.
Fixes: https://github.com/llvm/llvm-project/issues/109113
There is already a comment on the member and documentation in the
InstCombine contributor guide, but also rename it to make add
an additional speed bump.
This patch disables all clang warnings when running include-cleaner, as
users aren't interested in other findings and in-development code might
have them temporarily. This ensures tool can keep working even in
presence of such issues.
This attempts to improve user-experience when LLDB stops on a
verbose_trap. Currently if a `__builtin_verbose_trap` triggers, we
display the first frame above the call to the verbose_trap. So in the
newly added test case, we would've previously stopped here:
```
(lldb) run
Process 28095 launched: '/Users/michaelbuch/a.out' (arm64)
Process 28095 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = Bounds error: out-of-bounds access
frame #1: 0x0000000100003f5c a.out`std::__1::vector<int>::operator[](this=0x000000016fdfebef size=0, (null)=10) at verbose_trap.cpp:6:9
3 template <typename T>
4 struct vector {
5 void operator[](unsigned) {
-> 6 __builtin_verbose_trap("Bounds error", "out-of-bounds access");
7 }
8 };
```
After this patch, we would stop in the first non-`std` frame:
```
(lldb) run
Process 27843 launched: '/Users/michaelbuch/a.out' (arm64)
Process 27843 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = Bounds error: out-of-bounds access
frame #2: 0x0000000100003f44 a.out`g() at verbose_trap.cpp:14:5
11
12 void g() {
13 std::vector<int> v;
-> 14 v[10];
15 }
16
```
rdar://134490328
This patch is split off from PR #88385 and concerns only the code
related to the legality of vectorising early exit loops. It is the first
step in adding support for vectorisation of a simple class of loops that
typically involves searching for something, i.e.
for (int i = 0; i < n; i++) {
if (p[i] == val)
return i;
}
return n;
or
for (int i = 0; i < n; i++) {
if (p1[i] != p2[i])
return i;
}
return n;
In this initial commit LoopVectorizationLegality will only consider
early exit loops legal for vectorising if they follow these criteria:
1. There are no stores in the loop.
2. The loop must have only one early exit like those shown in the above
example. I have referred to such exits as speculative early exits, to
distinguish from existing support for early exits where the
exit-not-taken count is known exactly at compile time.
3. The early exit block dominates the latch block.
4. The latch block must have an exact exit count.
5. There are no loads after the early exit block.
6. The loop must not contain reductions or recurrences. I don't see
anything fundamental blocking vectorisation of such loops, but I just
haven't done the work to support them yet.
7. We must be able to prove at compile-time that loops will not contain
faulting loads.
Tests have been added here:
Transforms/LoopVectorize/AArch64/simple_early_exit.ll
Addition of a check in the MachineVerifier to detect and report illegal
vector registers to SGPR copies in the AMDGPU backend, ensuring correct
code generation.
We can enforce this check only after SIFixSGPRCopies pass.
This is half-fix in the pipeline with the help of isSSA MachineFuction
property, the check is happening for passes after phi-node-elimination.
…ntElimination
ArgumentPromotion and DeadArgumentElimination passes could change
function signatures but the function name remains the same as before the
transformation. This makes it hard for tracing with bpf programs where
user tends to use function signature in the source. See discussion [1]
for details.
This patch added suffix to functions whose signatures are changed. The
suffix lets users know that function signature has changed and they need
to impact the IR or binary to find modified signature before tracing
those functions.
The suffix for ArgumentPromotion is ".argprom" and the suffixes for
DeadArgumentElimination are ".argelim" and ".retelim". The suffix also
gives user hints about what kind of transformation has been done.
With this patch, I built a recent linux kernel with full LTO enabled. I
got 4 functions with only argpromotion like
```
set_track_update.argelim.argprom
pmd_trans_huge_lock.argprom
...
```
I got 1058 functions with only deadargelim like
```
process_bit0.argelim
pci_io_ecs_init.argelim
...
```
I got 3 functions with both argpromotion and deadargelim
```
set_track_update.argelim.argprom
zero_pud_populate.argelim.argprom
zero_pmd_populate.argelim.argprom
```
[1] https://github.com/llvm/llvm-project/issues/104678
Pass speculation target and assumption cache to
isSafeToSpeculativelyExecute() calls.
This allows speculating based on dereferenceable/align assumptions, but
the primary motivation here is to avoid regressions from planned changes
to fix https://github.com/llvm/llvm-project/issues/108854.
…ring.UninitRead
This is a drastic simplification of #106982. If you read that patch,
this is the same thing with all BugReporterVisitors.cpp and
SValBuilder.cpp changes removed! (since all replies came regarding
changed to those files, I felt the new PR was justified)
The patch was inspired by a pretty poor bug report on FFMpeg:

In this bug report, block is uninitialized, hence the bug report that it
should not have been passed to memcpy. The confusing part is in line 93,
where block was passed as a non-const pointer to seq_unpack_rle_block,
which was obviously meant to initialize block. As developers, we know
that clang likely didn't skip this function and found a path of
execution on which this initialization failed, but NoStoreFuncVisitor
failed to attach the usual "returning without writing to block" message.
I fixed this by instead of tracking the entire array, I tracked the
actual element which was found to be uninitialized (Remember, we
heuristically only check if the first and last-to-access element is
initialized, not the entire array). This is how the bug report looks
now, with 'seq_unpack_rle_block' having notes describing the path of
execution and lack of a value change:


Since NoStoreFuncVisitor was a TU-local class, I moved it back to
BugReporterVisitors.h, and registered it manually in CStringChecker.cpp.
This was done because we don't have a good trackRegionValue() function,
only a trackExpressionValue() function. We have an expression for the
array, but not for its first (or last-to-access) element, so I only had
a MemRegion on hand.
When ASan testing is enabled on SPARC as per PR #107405, the
```
AddressSanitizer-sparc-linux :: TestCases/Linux/odr_c_test.c
```
test `FAIL`s on Linux/sparc64:
```
+ projects/compiler-rt/test/asan/SPARCLinuxConfig/TestCases/Linux/Output/odr_c_test.c.tmp
+ count 0
Expected 0 lines, got 13.
AddressSanitizer:DEADLYSIGNAL
=================================================================
==4165420==ERROR: AddressSanitizer: BUS on unknown address (pc 0x7012d5b4 bp 0xffa3b938 sp 0xffa3b8d0 T0)
==4165420==The signal is caused by a READ memory access.
==4165420==Hint: this fault was caused by a dereference of a high value address (see register values below). Disassemble the provided pc to learn which register was used.
```
The test relies on an unaligned access, which cannot work on a
strict-alignment target like SPARC.
Thus this patch skips the test.
Tested on `sparc64-unknown-linux-gnu`.
InitUndef should also handle early-clobber / undef conflicts in inline
asm operands. Do this by iterating over all_defs() instead of defs().
The newly added ARM test was generating an "unpredictable STXP instruction,
status is also a source" error prior to this change.
Fixes https://github.com/llvm/llvm-project/issues/106380.
This extends the existing patterns for addp to 64bit outputs with a single
input. Whilst the general pattern is similar to the 128bit patterns
(add(uzp1(extract_lo, extract_hi), uzp2(extract_lo, extract_hi))), at the late
stage other optimzations have happened to turn the first uzp1 into trunc and
the second into extract(uzp2) with undef.
Fixes#109108
The current isHighCostExpansion cost model for addrecs computes the cost
for some kind of polynomial expansion that does not appear to have any
relation to addrec expansion whatsoever.
A literal expansion of an affine addrec is a phi and add (plus the
expansion of start and step). For a non-affine addrec, we get another
phi+add for each additional addrec nested in the step recurrence.
This partially `fixes` https://github.com/llvm/llvm-project/issues/53205
(the runtime unroll test case in this PR).
This replaces some uses of isSafeToSpeculativelyExecute() with
isSafeToSpeculativelyExecuteWithVariableReplaced(), in cases where we
are guarding against operand changes rather plain speculation.
I believe that this is NFC with the current implementation of the
function (as it only does something different from loads), but this
makes us more defensive against future generalizations.
This commit adds the missing support for varargs in the instruction
selection pass for AAPCS. Previously we only implemented this for
Darwin.
The implementation was according to AAPCS and SelectionDAG's
LowerAAPCS_VASTART.
It resolves all VA_START fallbacks in RAJAperf, llvm-test-suite, and
SPEC CPU2017. These benchmarks now compile and pass without fallbacks
due to varargs.
---------
Co-authored-by: Madhur Amilkanthwar <madhura@nvidia.com>
Note that this refers to the return type of SETCC. This operation is not
legal in PTX but was assumed as such because v2i16 is declared a legal
type. We were already expanding v4i8 SETCC.
The DAGCombiner would in certain circumstances try to fold an extension
of an illegal v2i1 SETCC (because v2i1 is illegal) into a "legal" v2i16
SETCC, which we wouldn't have patterns for.
Similar to commit 686cff17cc310884e48ae963bf7507f96950cc90 for SHT_REL (#57693).
CREL hasn't been tested with ICF before.
And avoid a pitfall that eqClass[0] might interfere with ICF.
macOS 10.15 added a "full" x86_64 GPR thread state flavor, equivalent to
the normal one but with DS, ES, SS, and GSbase added. This flavor can
only be used with processes that install a custom LDT (functionality
that was also added in 10.15 and is used by apps like Wine to execute
32-bit code).
Along with allowing DS, ES, SS, and GSbase to be viewed/modified, using
the full flavor is necessary when debugging a thread executing 32-bit
code.
If thread_set_state() is used with the regular thread state flavor, the
kernel resets CS to the 64-bit code segment (see
[set_thread_state64()](94d3b45284/osfmk/i386/pcb.c (L723)),
which makes debugging impossible.
There's no way to detect whether the full flavor is available, try to
use it and fall back to the regular one if it's not available.
A downside is that this patch exposes the DS, ES, SS, and GSbase
registers for all x86_64 processes, even though they are not populated
unless the full thread state is available.
I'm not sure if there's a way to tell LLDB that a register is
unavailable. The classic GDB `g` command [allows returning
`x`](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Packets.html#Packets)
to denote unavailable registers, but it seems like the debug server uses
newer commands like `jThreadsInfo` and I'm not sure if those have the
same support.
Fixes#57591
(also filed as Apple FB11464104)
@jasonmolenda
Change variable name `o` to `OS` to match definition, and
`ClName` to `ClassName` for better clarity.
Cache RegBank reference in the class and do no pass around
class members to functions.
The code was passing a physical register directly to getPressureSets
which expects a register unit.
Fix this by looping over the register units and calling getPressureSets
for each of them.
Found while trying to add a RegisterUnit class to stop storing register
units in `Register`. 0 is a valid register unit but not a valid
Register.
Added tests to the validator and fixed issues stemming from the previous skipping over BBs with single successors - which is incorrect. That would be now picked by added tests where the assertions are expected to be triggered.