8b4306ce050bd5 introduced validity checks for every module flag access,
because the auto-upgrader uses named metadata before verifying the
module.
This causes overhead for all other accesses, and the check is, in fact,
only need at that single place. Change the upgrader to be careful when
accessing module flags before the module is verified and remove the
checks on all other occasions.
There are two tangential optimizations included: first, when querying a
specific flag, don't enumerate all other flags into a vector as well.
Second, don't use a Twine for getNamedMetadata(), which has
materialization overhead -- all call sites use simple strings that can
be implicitly converted to a StringRef.
Outlining these patterns has a significant impact on the overall stack
frame size of llvm::UpgradeIntrinsicCall. This is helpful for scenarios
where compilation threads are stack-constrained. The overall impact is
low when using clang as the host compiler, but very pronounced when
using MSVC 2022 with release builds.
Clang: 1,624 -> 824 bytes
MSVC: 23,560 -> 6,120 bytes
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.
This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
Although i32 type is illegal in the backend, LA64 has pretty good
support for i32 types by using W instructions.
By adding n32 to the DataLayout string, middle end optimizations will
consider i32 to be a native type. One known effect of this is enabling
LoopStrengthReduce on loops with i32 induction variables. This can be
beneficial because C/C++ code often has loops with i32 induction
variables due to the use of `int` or `unsigned int`.
If this patch exposes performance issues, those are better addressed by
tuning LSR or other passes.
This addresses an issue where the explicit alignment of 2 (for C++ ABI
reasons) was being propagated to the back end and causing under-aligned
functions (in special sections).
This is an alternate approach suggested by @efriedma-quic in PR #90415.
Fixes#90358
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator== outnumbers StringRef::equals by a factor of 22
under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice
from the experimental namespace.
All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.
Currently neither the SPIR nor the SPIRV targets specify the AS for
globals in their datalayout strings. This is problematic because
CodeGen/LLVM will default to AS0 in this case, which produces Globals
that end up in the private address space for e.g. OCL, HIPSPV or SYCL.
This patch addresses it by completing the datalayout string.
D33412/D33413 introduced this to support a clang pragma to set section
names for a symbol depending on if it would be placed in
bss/data/rodata/text, which may not be known until the backend. However,
for text we know that only functions will go there, so just directly set
the section in clang instead of going through a completely separate
attribute.
Autoupgrade the "implicit-section-name" attribute to directly setting
the section on a Fuction.
Fixes issue noted at: https://github.com/llvm/llvm-project/pull/86274
When loading bitcode lazily, we may request debug intrinsics be upgraded
to debug records during the module parsing phase; later on we perform
this upgrade when materializing the module functions. If we change the
module's debug info format between parsing and materializing however,
then the requested upgrade is no longer correct and leads to an
assertion. This patch fixes the issue by adding an extra check in the
autoupgrader to see if the upgrade is no longer suitable, and either
exit-out or fall back to the correct intrinsic->intrinsic upgrade if one
is required.
This patch renames DPLabel to DbgLabelRecord, in accordance with the
ongoing DbgRecord rename. This rename was fairly trivial, since DPLabel
isn't as widely used as DPValue and has no real conflicts in either its
full or abbreviated name. As usual, the entire replacement was done
automatically, with `s/DPLabel/DbgLabelRecord/` and `s/DPL/DLR/`.
This is the major rename patch that prior patches have built towards.
The DPValue class is being renamed to DbgVariableRecord, which reflects
the updated terminology for the "final" implementation of the RemoveDI
feature. This is a pure string substitution + clang-format patch. The
only manual component of this patch was determining where to perform
these string substitutions: `DPValue` and `DPV` are almost exclusively
used for DbgRecords, *except* for:
- llvm/lib/target, where 'DP' is used to mean double-precision, and so
appears as part of .td files and in variable names. NB: There is a
single existing use of `DPValue` here that refers to debug info, which
I've manually updated.
- llvm/tools/gold, where 'LDPV' is used as a prefix for symbol
visibility enums.
Outside of these places, I've applied several basic string
substitutions, with the intent that they only affect DbgRecord-related
identifiers; I've checked them as I went through to verify this, with
reasonable confidence that there are no unintended changes that slipped
through the cracks. The substitutions applied are all case-sensitive,
and are applied in the order shown:
```
DPValue -> DbgVariableRecord
DPVal -> DbgVarRec
DPV -> DVR
```
Following the previous rename patches, it should be the case that there
are no instances of any of these strings that are meant to refer to the
general case of DbgRecords, or anything other than the DPValue class.
The idea behind this patch is therefore that pure string substitution is
correct in all cases as long as these assumptions hold.
If --load-bitcode-into-experimental-debuginfo-iterators is true then debug
intrinsics are auto-upgraded to DbgRecords (the new debug info format).
The upgrade is trivial because the two representations are semantically
identical. llvm.dbg.value with 4 operands and llvm.dbg.addr intrinsics are
upgraded in the same way as usual, but converted directly into DbgRecords
instead of debug intrinsics.
The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling also matches
the asm directive I added in bc82cfb.
sign-return-address and similar module attributes should be propagated to
the function level before got merged because module flags may contradict and
this information is not recoverable.
Generated code will match with the normal linking flow.
Refactored version of (#80640).
Run the attribute copy only during IRMove.
`sign-return-address` and similar module attributes should be propagated
to the function level before modules got merged because module flags may
contradict and this information is not recoverable.
Generated code will match with the normal linking flow.
This reverts commit e8512786fedbfa6ddba70ceddc29d7122173ba5e.
This revert is done because llvm::drop_begin over an empty ArrayRef
doesn't return an empty range, and therefore can lead to an invalid
address returned instead.
See discussion in https://github.com/llvm/llvm-project/pull/80737 for
more context.
Ensure intrinsics and auto-upgrades support i16, i32, and i64 for for
`nvvm.{min,max,mulhi,sad}`
- `nvvm.min` and `nvvm.max`: These are auto-upgraded to `select`
instructions but it is still nice to support the 16 bit variants just in
case any generators of IR are still trying to use these intrinsics.
- `nvvm.sad` added both the 16 and 64 bit variants, also marked this
instruction as speculateble. These directly correspond to the PTX
`sad.{u16,s16,u64,s64}` instructions.
- `nvvm.mulhi` added the 16 bit variants. These directly correspond to
the PTX `mul.hi.{s,u}16` instructions.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.
We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
We can use Intrinsic::getDeclaration() here, we just have to pass
the correct arguments. This function accepts only the mangled types,
not all argument types.
With these three intrinsics it's probable faster to check the number of
arguments first and then check the names. We can also handle ctlz and
cttz in the same block.
This is an attempt at rebooting https://reviews.llvm.org/D28990
I've included AutoUpgrade changes to modify the data layout to satisfy the compatible layout check. But this does mean alloca, loads, stores, etc in old IR will automatically get this new alignment.
This should fix PR46320.
Reviewed By: echristo, rnk, tmgross
Differential Revision: https://reviews.llvm.org/D86310
Partial progress towards removing in-tree uses of `getPointerTo()`,
by employing the following options:
* Drop the call entirely if the sole purpose of it is to support a no-op
bitcast (remove the no-op bitcast as well).
* Replace with `PointerType::get()`/`PointerType::getUnqual()`
This is a NFC cleanup effort.
Reviewed By: barannikov88
Differential Revision: https://reviews.llvm.org/D155232
One of the main user of these kind of coroutines is swift. There yield-once (`retcon.once`) coroutines are used to temporary "expose" pointers to internal fields of various objects creating borrow scopes.
However, in some cases it might be useful also to allow these coroutines to produce a normal result, but there is no convenient way to represent this (as compared to switched-resume kind of coroutines where C++ `co_return`
is transformed to a member / callback call on promise object).
The extension is simple: we allow continuation function to have a non-void result and accept optional extra arguments via a special `llvm.coro.end.result` intrinsic that would essentially forward them as normal results.
The NVPTX intrinsics are under 'n'. Use the consume_front API, so fix
that. Refactor the helper function to group matchers on the first
component and check that first. Do similarly with the final set of
intrinsics, which have a lot of commonality in the matching. Finally
reorder the argument/return type checking wrt name checking -- the
former is going to be cheaper, so do that first before checking the
name.#
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D158445