The class was taking ownership of the SBCommandPluginInterface pointer
it was passed in, by wrapping it in a shared pointer. This causes a
double free in the unit test when the object is destroyed and the same
pointer gets freed once when the SBCommandPluginInterface goes away and
then again when the shared pointer hits a zero refcount.
This patch upstreams ClangIR's CastOp with the following exceptions:
- No Fixed/FP conversions
- No casts between value categories
- No complex casts
- No array_to_ptrdecay
- No address_space
- No casts involving record types (member pointers, base/derived casts)
- No casts specific to ObjC or OpenCL
---------
Co-authored-by: Morris Hafner <mhafner@nvidia.com>
Co-authored-by: Erich Keane <ekeane@nvidia.com>
Insert the start instruction directly into the map before the uses. This
prevents improperly re-visting sgpr->vgpr phi inputs multiple times
which
would trigger a use after free.
I don't particularly trust the iteration scheme here. This is also
unnecessarily revisting transitive users of a phi or reg_sequence for
every
input operand, but I will address that separately.
Fixes#130646. I also believe it fixes#130119, although that test fails
less consistently for me.
After https://reviews.llvm.org/D116539, when `m_gdb_client_up` in
`PlatformRemoteGDBServer` is not null, the connection to a server is
expected to exist. However,
`PlatformRemoteGDBServer::DisconnectRemote()` is not the only way to
close the connection;
`GDBRemoteCommunication::WaitForPacketNoLock()` can disconnect if the
server stops responding, and in this case `m_gdb_client_up` is not
cleared. The patch removes this assumption and checks the connection
status directly.
After #128718 lands there will be two ways of performing a reversed
widened memory access, either by performing a consecutive unit-stride
access and a reverse, or a strided access with a negative stride.
Even though both produce a reversed vector, only the former needs
VPReverseVectorPointerRecipe which computes a pointer to the last
element of each part. A strided reverse still needs a pointer to the
first element of each part so it will use VPVectorPointerRecipe.
This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to
clarify that a reversed access may not necessarily need a pointer to the
last element.
The following three string-like constructors for `std::bitset`
- `bitset(const CharT* str, std::size_t n, CharT zero, CharT one)`;
- `bitset(const std::basic_string<CharT, Traits, Alloc>& str, typename
std::basic_string<CharT, Traits, Alloc>::size_type pos, CharT zero,
CharT one)`;
- `bitset(std::basic_string_view<CharT, Traits> str, std::size_t pos,
std::size_t n, CharT zero, CharT one)`
already initialize the underlying storage array to all zeroes via
default-constructor of the base class `__bitset`. Therefore,
re-assigning the storage array to zeroes via `std::fill_n` in the
string-like constructors is truly redundant.
This extension adds 10 instructions that provide hints to the interface
simulation environment.
The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/
This patch adds assembler only support.
This mechanism causes the greedy register allocator to prefer allocating
register classes with higher priority first. This helps to ensure that
high LMUL registers obtain a register without having to go through the
eviction mechanism. In practice, it seems to cause a bunch of code
churn, and some minor improvement around widening and narrowing
operations.
In a few of the widening tests, we have what look like code size
regressions because we end up with two smaller register class copies
instead of one larger one after the instruction. However, in any larger
code sequence, these are likely to be folded into the producing
instructions. (But so were the wider copies after the operation.)
Two observations:
1) We're not setting the greedy-regclass-priority-trumps-globalness flag
on the register class, so this doesn't help long mask ranges. I
thought about doing that, but the benefit is non-obvious, so I
decided it was worth a separate change at minimum.
2) We could arguably set the priority higher for the register classes
that exclude v0. I tried that, and it caused a whole bunch of
further churn. I may return to it in a separate patch.
The virtual method `ProgramPointTag::getTagDescription` had two very
distinct use cases:
- It is printed in the DOT graph visualization of the exploded graph
(that is, a debug printout).
- The checker option handling code used it to query the name of a
checker, which relied on the coincidence that in `CheckerBase` this
method is defined to be equivalent with `getName()`.
This commit switches to using `getName` in the second use case, because
this way we will be able to properly support checkers that have multiple
(separately named) parts.
The method `reportInvalidCheckerOptionName` is extended with an
additional overload that allows specifying the `CheckerPartIdx`. The
methods `getChecker*Option` could be extended analogously in the future,
but they are just convenience wrappers around the variants that directly
take `StringRef CheckerName`, so I'll only do this extension if it's
needed.
Arrays with assumed-length types are represented with a box
without explicit length parameters. This patch fixes the verification
to allow it for `fir.pack_array`.
I keep getting these warnings when building with clang-17:
`warning: 'StaticDescriptor' may not intend to support class template
argument deduction [-Wctad-maybe-unsupported]`
This change should help avoiding them.
This patch fixes:
mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp:121:3:
error: default label in switch which covers all enumeration values
[-Werror,-Wcovered-switch-default]
This is NFC patch to capture the insertion of prologue and epilogue by
`shrinkwrap` pass for Powerpc target for functions that contain llvm
`__builtin_frame_address`.
---------
Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
The current parsing logic for the target string assumes it follows the
format `<kind>-<triple>-<target id>:<feature>`, such as
`hipv4-amdgcn-amd-amdhsa-gfx1030:+xnack`.
Specifically, it assumes that `<target id>` does not contain any `-`,
relying on `rsplit` for parsing.
However, this assumption breaks for AMDGPU's generic targets, which may
contain one or more `-`, such as `gfx10-3-generic` or `gfx12-generic`.
As a result, the existing approach using `rstrip` is no longer reliable.
This patch reworks the parsing logic to handle target strings more
robustly, including support for generic targets.
The bundler now strictly requires a 4-field target triple.
Additionally, a new Python helper function has been added to `config.py`
to normalize the target triple into the 4-field format when it is not,
ensuring tests pass reliably.
For the NewPM, the merge-const option was assigned to an unused
option field. Assign it to the correct one. The merge-const-aggressive
option was not supported -- and invalid options were silently ignored.
Accept it and error on invalid options.
For the LegacyPM, the corresponding cl::opt options were ignored when
called via opt rather than llc.
Fixes incorrect logic that went unnoticed until the function was tested
with output and input types that have the same underlying floating-point
format.
When determining the install prefix in LLVMConfig.cmake etc resolve
symlinks in CMAKE_CURRENT_LIST_FILE first. The motivation for this is to
support symlinks like `/usr/lib64/cmake/llvm` to
`/usr/lib64/llvm19/lib/cmake/llvm`. This only works correctly if the
paths are relative to the resolved symlink.
It's worth noting that this *mostly* already works out of the box,
because cmake automatically does the symlink resolution when the library
is found via CMAKE_PREFIX_PATH. It just doesn't happen when it's found
via the default prefix path.
This commit adds the following Dense Math Facility integer calculation
instructions: dmxvi8gerx4, dmxvi8gerx4pp, dmxvi8gerx4spp, pmdmxvi8gerx4,
pmdmxvi8gerx4pp, and pmdmxvi8gerx4spp, along with their corresponding
intrinsics and tests.
ELFExtendedAttrParser lacked a destructor that properly handled errors,
causing `llvm-readobj --arch-specific` to crash when the AArch64 Build
Attributes section was empty.
This commit adds error handling in the destructor and introduces test
files for `--arch-specific` to cover both an empty AArch64 Build
Attributes section and a populated one.
Fixes:
b1ebfac185
A few test files seemed to have been edited after using the
update_test_checks.py script, which can make life hard for
developers when trying to update these tests in future
patches. Also, the tests still had this comment at the top
; NOTE: Assertions have been autogenerated by ...
which could potentially be confusing, since they've not
strictly been auto-generated.
I've attempted to keep the spirit of the original tests by
excluding all CHECK lines after the scalar.ph IR block,
however I've done this by using a new option called
--filter-out-after to the update_test_checks.py script.
We already handle the X86ISD::VPERMV3 node type, but if we can handle equivalent cases before intrinsic lowering we can simplify the code further - e.g. #109272 before constant BUILD_VECTOR nodes gets lowered to constant pool loads.
The language reference says about inbounds geps that "if the
getelementptr has any non-zero indices[...] [t]he base pointer has an in
bounds address of the allocated object that it is based on [and]
[d]uring the successive addition of offsets to the address, the
resulting pointer must remain in bounds of the allocated object at each
step."
If (gep inbounds p, (a + 5)) is translated to (gep [inbounds] (gep p,
a), 5) with p pointing to the beginning of an object and a=-4, as the
example in the comments suggests, that's the case for neither of the
resulting geps. Therefore, we need to clear the inbounds flag for both
geps.
We might want to use ValueTracking to check if a is known to be
non-negative to preserve the inbounds flags.
For the AMDGPU tests with scratch instructions, removing the unsound
inbounds flag means that AMDGPUDAGToDAGISel::isFlatScratchBaseLegal sees
no NUW flag at the pointer add, which prevents generation of scratch
instructions with immediate offsets.
For SWDEV-516125.