If the user node is a buildvector/gather node and it has no internal
instructions state, need to check properly for this state and check the
type of the node itself, not its operands.
Fixes#129242
The effect of this commit is too broad and may affect also those
variants of Linux systems on which the affected test cases are known to
pass.
An alternative version of this commit will be prepared afresh.
This reverts commit c93dc581d979eb20ded470d2c16e51b3e775f6e7.
In some cases, it is possible for the same underlying object to be
accessed via pointers to different address spaces. This could lead to
pointers from different address spaces ending up in the same dependency
set, which isn't allowed (and triggers an assertion).
Update the mapping from underlying object -> last access to also include
the accessing address space.
Fixes https://github.com/llvm/llvm-project/issues/124759.
PR: https://github.com/llvm/llvm-project/pull/129087
Include HLSL or_intrinsic, add codegen in CGBuiltin, and the
corresponding tests in or.hlsl. Additionally, incorporate
logical-operator-errors to handle both 'and' and 'or' semantic
diagnostics.
The codecvt class is defined in <locale> and the contents of the
<codecvt> header don't work when localization is disabled. Without this
guard, builds with localization disabled that happen to include
<codecvt> could be broken because they would try to include <__locale>,
which ends up trying to include the locale base API and eventually
platform headers like <xlocale.h> that may not exist.
Following on from #122927 + #123031 that added support for masked
vectorization of `tensor.insert_slice`, this PR extends the e2e test for
`tensor.unpack` to leverage the new functionality.
This intrinsic is used when whole program vtables is used in conjunction
with either CFI or virtual function elimination. The
`llvm.type.checked.load` is unconditionally used, but we need to use the
relative intrinsic for WPD and CFI to work correctly.
The following reproducer demonstrates the issue with invalid definition
of GEP results during type inference
```
define spir_kernel void @foo(i1 %fl, i64 %idx, ptr addrspace(1) %dest, ptr addrspace(3) %src) {
%p1 = getelementptr inbounds i8, ptr addrspace(1) %dest, i64 %idx
%res = tail call spir_func target("spirv.Event") @_Z22__spirv_GroupAsyncCopyjPU3AS1iPU3AS3Kimm9ocl_event(i32 2, ptr addrspace(1) %p1, ptr addrspace(3) %src, i64 128, i64 1, target("spirv.Event") zeroinitializer)
ret void
}
declare dso_local spir_func target("spirv.Event") @_Z22__spirv_GroupAsyncCopyjPU3AS1iPU3AS3Kimm9ocl_event(i32, ptr addrspace(1), ptr addrspace(3), i64, i64, target("spirv.Event"))
```
Here `OpGroupAsyncCopy` expects i32* arguments and type inference fails
to set a correct type of the GEP result `%p1`, because it is an argument
of `OpGroupAsyncCopy`.
This PR fixes the issue by preventing type change of GEP results in type
inference.
If the signed node is the operand of UITOFP, the bitwidth analysis
should consider minimum value between incoming bitwidth and the bitwidth
of the UITOFP node.
Fixes#129244
This patch adds initial vector extension support to RISC-V's exegesis.
The strategy here is to enumerate all RVV _pseudo_ opcodes as their MC
opcode counterparts are kind of useless under this circumstance. We also
enumerate all possible VTYPE operands in each CodeTemplate
configuration. Various of MachineFunction Passes are used for post
processing the snippets, like inserting VSETVLI instructions.
See https://llvm.org/devmtg/2024-10/slides/techtalk/Hsu-RVV-Exegesis.pdf
for more technical details.
Update PadOp's pad_const input to be rank1.
Fix various lit tests for this change including some conv ops
Signed-off-by: Jerry Ge <jerry.ge@arm.com>
Signed-off-by: Tai Ly <tai.ly@arm.com>
Co-authored-by: Tai Ly <tai.ly@arm.com>
This patch adds a helper flag for bisection debugging. This flag
force-stops vectorization after this many bundles have been considered
for vectorization.
Using -sbvec-stop-bndl=0 will not vectorize the code at all.
This reflects the add_new_check.py changes: isLanguageVersionSupported
is now overridden by default by the script
The changes were instroduced in
https://github.com/llvm/llvm-project/pull/100129
Thanks
This adds support for launching lldb-dap in server mode. The extension
will start lldb-dap in server mode on-demand and retain the server until
the VSCode window is closed (when the extension context is disposed).
While running in server mode, launch performance for binaries is greatly
improved by improving caching between debug sessions.
For example, on my local M1 Max laptop it takes ~5s to attach for the
first attach to an iOS Simulator process and ~0.5s to attach each time
after the first.
LLVM IR emitted in from C++ may contain `@llvm.global_ctors = appending
global [0 x { i32, ptr, ptr }] zeroinitializer`. Before this PR, if we
try to roundtrip code like this from the importer, we'll end up with
nothing in place.
Note that `llvm::appendToGlobalCtors` ignores empty lists and this PR
uses the same approach as `llvm-as`, which doesn't use the utilities
from `llvm/lib/Transforms/Utils/ModuleUtils.cpp` in order to build this
- it calls into creating a global variable from scratch.
`MeasureTokenLength` may return an unsigned 0 representing failure in
obtaining length of a token. The analysis now gives up on such cases.
Otherwise, there might be issues caused by unsigned integer "overflow".
If the ControlFlowHub did not perform any change to the control flow,
there is no need to repair SSA, update the loop structure, and verify a
bunch of things. This is not completely NFC though, repairSSA introduced
PHI nodes with a single entry that are now missing.
My code went from 400+ seconds to 1 second, since no loop required the
exits to be unified, but there were many "complex" loops.
and fix crash when vd_aux is invalid (#86611).
vd_version, vd_flags, vd_ndx, and vd_cnt in Elf{32,64}_Verdef are
16-bit. Change VerDef to use uint16_t instead.
vda_name specifies a NUL-terminated string. Update getVersionDefinitions
to remove some `.c_str()`.
Pull Request: https://github.com/llvm/llvm-project/pull/128434
Do not hardcode `address_space(5)` (`private`) in the ROCDL interface,
as that breaks SPIRV generation (the latter uses 0). Add test. In the
long run we should stop using ROCDL inline.
Originated from #120687
This PR simply adds the necessary headers for UEFI which defines all the
necessary types. This PR unlocks the ability to work on other PR's for
UEFI support.
Fixes#116444.
Closed#127700 because I accidentally updated it in github UI.
### Current vs expected behavior
Previously, the result of a `CXXNewExpr` was not always list initialized
when using an initializer list.
In this example:
```
struct S { int x; };
void F() {
S *s = new S{1};
delete s;
}
```
there would be a binding of `s` to `compoundVal{1}`, but this isn't used
during later field binding lookup. After this PR, there is instead a
binding of `s->x` to `1`. This is the cause of #116444 since the field
binding lookup returns undefined in some cases currently.
### Changes
This PR swaps around the handling of typed value regions (seems to be
the usual region type when doing non-CXX-new-expr list initialization)
and symbolic regions (the result of the CXX new expr), so that symbolic
regions also get list initialized. In the below snippet, it swaps the
order of the two conditionals.
8529bd7b96/clang/lib/StaticAnalyzer/Core/RegionStore.cpp (L2426-L2448)
### Followup work
This PR only makes CSA do list init for `CXXNewExpr`s. After this, I
would like to make some changes to `RegionStoreMananger::bind` in how it
handles list initialization generally.
I've added some straightforward test cases here for the `new` expr with
a list initializer. I started adding some more before realizing that the
current general (not just `new` expr) list initialization could be
changed to handle more cases like list initialization of unions and
arrays (like https://github.com/llvm/llvm-project/issues/54910). Lmk if
it is preferred to then leave these test cases out for now.
Add two additional profile quality stats for CG (call graph) and CFG
(control flow graph) flow conservations besides the CFG discontinuity
stats introduced in #109683. The two new stats quantify how different
"in-flow" is from "out-flow" in the following cases where they should be
equal. The smaller the reported stats, the better the flow conservations
are.
CG flow conservation: for each function that is not a program entry, the
number of times the function is called according to CG ("in-flow")
should be equal to the number of times the transition from an entry
basic block of the function to another basic block within the function
is recorded ("out-flow").
CFG flow conservation: for each basic block that is not a function entry
or exit, the number of times the transition into this basic block from
another basic block within the function is recorded ("in-flow") should
be equal to the number of times the transition from this basic block to
another basic block within the function is recorded ("out-flow").
Use `-v=1` for more detailed bucketed stats, and use `-v=2` to dump
functions / basic blocks with bad flow conservations.
With a fix for fully undef masks. These can't reach the lowering code, but
can reach the costing code via e.g. SLP.
This change adds the TTI costing corresponding to the recently added
isMaskedSlidePair lowering for vector shuffles. However, since the
existing costing code hadn't covered either slideup, slidedown, or the
(now removed) isElementRotate, the impact is larger in scope than just
that new lowering.
---------
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
Co-authored-by: Luke Lau <luke_lau@icloud.com>
Fixes#124269
PrepareTrivalCall always had the possibility of failing, but given that
it only wrote to general purpose registers, if it did, you had bigger
problems.
When it failed, we did not mark the thread plan valid and when it was
torn down we didn't try to restore the register state. This meant that
if you tried to continue, the program was unlikely to work.
When I added AArch64 GCS support, I needed to handle the situation where
the GCS pointer points to unmapped memory and we fail to write the extra
entry we need. So I added code to restore the gcspr_el0 register
specifically if this happened, and ordered the operations so that we
tried this first.
In this change I've made the teardown of an invalid thread plan restore
the register state if one was saved. It may be there isn't one if
ConstructorSetup fails, but this is ok because that function does not
modify anything.
Now that we're doing that, I don't need the GCS specific code anymore,
and all thread plans are protected from this in the rare event something
does fail.
Testing is done by the existing GCS test case that points the gcspr into
unmapped memory which causes PrepareTrivialCall to fail. I tried adding
a simulated test using a mock gdb server. This was not possible because
they all use DynamicLoaderStatic which disables all JIT features.