This was looking through an addrspacecast, and not finding a later
unfoldable cast to another address space. Fixes improperly deleting
a required alloca + memcpy and introducing an illegal addrspacecast.
This also required fixing some worklist management issues with
addrspacecast, and assuming that only memcpy sources could need
replacement.
Regresses one test function, but this looks like it optimized
before by accident. It never saw the pointer use by the call
to readonly_callee, which should require insertion of a new cast.
Fixes#68120
Even as the NPM has been in use by Polly for a while now, the
majority of the tests continue using the LPM passes. This patch
ports the tests to use the NPM passes (for example, by replacing
a flag such as -polly-detect with -passes=polly-detect following
the NPM syntax for specifying passes) with some exceptions for
some missing features in the new passes. Additionally, the lit
substitution %loadPolly is replaced by the substitution of what
was %loadNPMPolly and %loadNPMPolly is removed.
The middle end will remove the inner vsetvli otherwise, and it's more
typical to set the AVL to the remaining VL.
This also prevents the test from showing up as a regression in #91319
Split off from #70549, this patch moves RISCVInsertVSETVLI to after phi
elimination where we exit SSA and need to move to LiveVariables.
The motivation for splitting this off is to avoid the large scheduling
diffs from moving completely to after regalloc, and instead focus on
converting the pass to work on LiveIntervals.
The two main changes required are updating VSETVLIInfo to store VNInfos
instead of MachineInstrs, which allows us to still check for PHI defs in
needVSETVLIPHI, and fixing up the live intervals of any AVL operands
after inserting new instructions.
On O3 the pass is inserted after the register coalescer, otherwise we
end up with a bunch of COPYs around eliminated PHIs that trip up
needVSETVLIPHI.
Co-authored-by: Piyou Chen <piyou.chen@sifive.com>
The ExceptionDemo example was no longer compiling (since llvm 14 at
least). The PR makes the example work with the current API and also
transition from MCJIT to ORC.
Fixes#63702
Close https://github.com/llvm/llvm-project/issues/91418
Since we load the variable's initializers lazily, it'd be problematic
if the initializers dependent on each other. So here we try to load the
initializers of static variables to make sure they are passed to code
generator by order. If we read any thing interesting, we would consume
that before emitting the current declaration.
When flattening the loop, if the GEP was inbound, it should stay
inbound, because the only thing that changed is how the pointers are
calculated, not the elements being accessed.
Proof: https://alive2.llvm.org/ce/z/dApMpQ
Utility converting a profile coming from `compiler_rt` to bitstream, and
a reader.
`PGOCtxProfileWriter::write` would be used as the `Writer` parameter for
`__llvm_ctx_profile_fetch` API. This is expected to happen in user code,
for example in the RPC hanler tasked with collecting a profile, and
would look like this:
```
// set up an output stream "Out", which could contain other stuff
{
// constructing the Writer will start the section, in Out, containing
// the collected contextual profiles.
PGOCtxProfWriter Writer(Out);
__llvm_ctx_profile_fetch(&Writer, +[](void* W, const ContextNode &N) {
reinterpret_cast<PGOCtxProfWriter*>(W)->write(N);
});
// Writer going out of scope will finish up the section.
}
```
The reader produces a data structure suitable for maintenance during IPO
transformations.
Summary:
One of the downsides of the linker wrapper is that it made debugging
more difficult. It is very powerful in that it can resolve a lot of
input matching and library handling that could not be done before.
However, the old method allowed users to simply copy-paste the script
files to modify the output and test it.
This patch attempts to make it easier to debug changes by letting the
user override all the linker inputs. That is, we provide a user-created
binary that is treated like the final output of the device link step.
The intended use-case is for using `-save-temps` to get some IR, then
modifying the IR and sticking it back in to see if it exhibits the old
failures.
After patch https://github.com/llvm/llvm-project/pull/88805
`I` Ext will be added automatically when we running the command like
`./build/bin/llc -mtriple=riscv32 -mattr=+e -target-abi ilp32e
-verify-machineinstrs llvm/test/CodeGen/RISCV/zcmp-additional-stack.ll`
it will generate
```
.text
.attribute 4, 16
.attribute 5, "rv32i2p1_e2pe"
.file "zcmp-additional-stack.ll"
.globl func # -- Begin function func
.p2align 1
.type func,@function
```
This patch reset the I ext in FeatureBit when `+e` has been specify
Type unit DIE generated by clang contains DW_AT_comp_dir/DW_AT_dwo_name.
This was added to clang to help LLDB to figure out where type unit come
from when accessing an entry in a .debug_names accelerator table and
type units in .dwp file.
When BOLT writes out .dwo files it changes the name of them. User can
also specify directory of where they can be written out. Added support
to BOLT to update those attributes.
As we have debuginfod as symbol locator available in lldb now, we want
to make full use of it.
In case of post mortem debugging, we don't always have the main
executable available.
However, the .note.gnu.build-id of the main executable(some other
modules too), should be available in the core file, as those binaries
are loaded in memory and dumped in the core file.
We try to iterate through the NT_FILE entries, read and store the gnu
build id if possible. This will be very useful as this id is the unique
key which is needed for querying the debuginfod server.
Test:
Build and run lldb. Breakpoint set to
https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/SymbolLocator/Debuginfod/SymbolLocatorDebuginfod.cpp#L147
Verified after this commit, module_uuid is the correct gnu build id of
the main executable which caused the crash(first in the NT_FILE entry)
SubtargetEmitter::GenSchedClassTables takes a CodeGenProcModel, but
calls hasReadOfWrite which loops over all ProcModels. We move
hasReadOfWrite to CodeGenProcModel and remove the loop over all
ProcModels. This leads to a 144% speedup on the RISC-V backend of our
downstream.
This mirrors the LLDB_DEBUGSERVER_PATH environment variable and allows
you to have lldb-argdumper in a non-standard location and still use it
at runtime.
The existing code is checking for the presence of the +sve subtarget
feature
when deciding to use a base pointer for the function, but this check
doesn't
work when only +sme is used.
rdar://126878490
Summary:
The offload library supports basic JIT functionality, however we
currently link against every single target even though only AMDGPU and
NVPTX are supported. This somewhat bloats the dynamic library list, so
we should constrain it to what's actually used.
Applying the loop guards to the distance may prevent
isSafeDependenceDistance from determining NoDep, unless loop guards are
also applied to the backedge-taken-count.
Instead of applying the guards to both Dist and the
backedge-taken-count, just apply them after handling
isSafeDependenceDistance and constant distances; there is no benefit to
applying the guards before then.
This fixes a regression flagged by @bjope due to
ecae3ed958481cba7d60868cf3504292f7f4fdf5.
This change allows us to pass creduce options to creduce-clang-crash.py
script. With this, `--n` is no longer needed to specify the number of
cores, so removed the flag.
The motivation is
https://github.com/llvm/llvm-project/pull/87933#issuecomment-2109463497
suggests that disabling creduce renaming passes helps people to further
reduce crash manually.
This PR is in preparation to some extensions to the
`InferIntRangeInterface` around the `nsw` and `nuw` flags supported in
the `arith` dialect and LLVM.
We provide some common inference logic for `index` and `arith` in
`InferIntRangeCommon.h` but our Test Ops are currently fixed to `Index`
Types. As we test the range inference for arith Ops, especially around
the overflow behaviour, it's handy to have native support for the
typical integer types in the test Ops.
This patch
1. Changes the Attributes of `test.with_bounds` ops from `Index` to
`APInt` which matches the internal representation in
`ConstantIntRanges`.
2. Allows the use of `AnyInteger` in addition to `Index` for the
operands and results of the test Ops. This now requires explicit
specification of the type in the IR, where before `Index` was implicit.
3. Requires bounds Attrs to be specified in the precision of the SSA
value, eliminating any implicit truncation or extension. (*Could this
lead to problems?*)
Prior to this, fixed point multiplication would lead to this assertion
error on AArhc64, armv8, and armv7.
```
_Accum f(_Accum x, _Accum y) { return x * y; }
// ./bin/clang++ -ffixed-point /tmp/test2.cc -c -S -o - -target aarch64 -O3
clang++: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:10245: void llvm::TargetLowering::forceExpandWideMUL(SelectionDAG &, const SDLoc &, bool, EVT, const SDValue, const SDValue, const SDValue, const SDValue, SDValue &, SDValue &) const: Assertion `Ret.getOpcode() == ISD::MERGE_VALUES && "Ret value is a collection of constituent nodes holding result."' failed.
```
This path into forceExpandWideMUL should only be taken if we don't
support [US]MUL_LOHI or MULH[US] for the operand size (32 in this case).
But we should also check if we can just leverage regular wide
multiplication. That is, extend the operands from 32 to 64, do a regular
64-bit mul, then trunc and shift. These ops are certainly available on
aarch64 but for wider types.
There are cases where a vector value has some users that demand the
the single scalar value only (NeedsScalar), while other users demand the
vector value (see attached test cases). In those cases, the NeedsScalar
users should only demand the first lane.
Fixes https://github.com/llvm/llvm-project/issues/91883.
This follows the same pattern as 20e62658735a1b03ecadc. Although we
can't reduce the number of instructions used, if we are able to use a
sign-extended 6-bit immediate then the 16-bit c.li instruction can be
selected (thus saving code size). Although this _could_ be gated so it
only happens if C is enabled, I've opted not to because at worst it's
neutral and it doesn't seem helpful to add unnecessary divergence
between the RVC and non-RVC paths.
When expression got errors (missing typedef) and clang-tidy is compiled
with asserts enabled, then we crash in this check on assert because type
with errors is visible as an dependent one. This is issue caused by
invalid input.
But as there is not point to crash in such case and generate additional
confusion, such expressions with errors will be now ignored.
Fixes#89515, #55293