With -fpatchable-function-entry (or the patchable_function_entry
function attribute), we emit records of patchable entry locations to the
__patchable_function_entries section. Add an additional parameter to the
command line option that allows one to specify a different default
section name for the records, and an identical parameter to the function
attribute that allows one to override the section used.
The main use case for this change is the Linux kernel using prefix NOPs
for ftrace, and thus depending on__patchable_function_entries to locate
traceable functions. Functions that are not traceable currently disable
entry NOPs using the function attribute, but this creates a
compatibility issue with -fsanitize=kcfi, which expects all indirectly
callable functions to have a type hash prefix at the same offset from
the function entry.
Adding a section parameter would allow the kernel to distinguish between
traceable and non-traceable functions by adding entry records to
separate sections while maintaining a stable function prefix layout for
all functions. LKML discussion:
https://lore.kernel.org/lkml/Y1QEzk%2FA41PKLEPe@hirez.programming.kicks-ass.net/
The current API doesn't have a way to unset it. The query returns
an optional, but the set doesn't. Alternatively I could switch the
set to also use optional.
During the transition from debug intrinsics to debug records, we used
several different command line options to customise handling: the
printing of debug records to bitcode and textual could be independent of
how the debug-info was represented inside a module, whether the
autoupgrader ran could be customised. This was all valuable during
development, but now that totally removing debug intrinsics is coming
up, this patch removes those options in favour of a single flag
(experimental-debuginfo-iterators), which enables autoupgrade, in-memory
debug records, and debug record printing to bitcode and textual IR.
We need to do this ahead of removing the
experimental-debuginfo-iterators flag, to reduce the amount of
test-juggling that happens at that time.
There are quite a number of weird test behaviours related to this --
some of which I simply delete in this commit. Things like
print-non-instruction-debug-info.ll , the test suite now checks for
debug records in all tests, and we don't want to check we can print as
intrinsics. Or the update_test_checks tests -- these are duplicated with
write-experimental-debuginfo=false to ensure file writing for intrinsics
is correct, but that's something we're imminently going to delete.
A short survey of curious test changes:
* free-intrinsics.ll: we don't need to test that debug-info is a zero
cost intrinsic, because we won't be using intrinsics in the future.
* undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory
mode while we sorted something out; it works now either way.
* salvage-cast-debug-info.ll: was testing intrinsics-in-memory get
salvaged, isn't necessary now
* localize-constexpr-debuginfo.ll: was producing "dead metadata"
intrinsics for optimised-out variable values, dbg-records takes the
(correct) representation of poison/undef as an operand. Looks like we
didn't update this in the past to avoid spurious test differences.
* Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing
that debug-info affected codegen, and we deferred updating the tests
until now. This is just one of those silent gnochange issues that get
fixed by RemoveDIs.
Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc,
that checks we can autoupgrade debug intrinsics that are in bitcode into
the new debug records.
This adds DWARF generation for fixed-point types. This feature is needed
by Ada.
Note that a pre-existing GNU extension is used in one case. This has
been emitted by GCC for years, and is needed because standard DWARF is
otherwise incapable of representing these types.
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E.first);
down to:
Set.insert_range(llvm::make_first_range(Range));
In some cases, we can further fold that into the set declaration.
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E);
down to:
Set.insert_range(Range);
In some cases, we can further fold that into the set declaration.
In Ada, an array can be packed and the elements can take less space than
their natural object size. For example, for this type:
type Packed_Array is array (4 .. 8) of Boolean;
pragma pack (Packed_Array);
... each element of the array occupies a single bit, even though the
"natural" size for a Boolean in memory is a byte.
In DWARF, this is represented by putting a DW_AT_bit_stride onto the
array type itself.
This patch adds a bit stride to DICompositeType so that gnat-llvm can
emit DWARF for these sorts of arrays.
This commit follows up 0191307b by auto-upgrading !"align" metadata on
return values to stackalign. This allows us to remove all logic to check
the metadata from NVPTXUtilities.
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch uses insert_range with
iterator ranges. For each case, I've verified that foos is defined as
make_range(foo_begin(), foo_end()) or in a similar manner.
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch uses insert_range in
conjunction with llvm::{predecessors,successors} and
MachineBasicBlock::{predecessors,successors}.
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
After 3c8c2914e067e132af951f70d2b3577fe049e19a the lowering of 64-bit
funnel shifts has been improved to the point where this intrinsic is no
longer needed.
In [1], Nikita Popov suggested that during lowering 'unreachable' insn
should not generate extra code for naked functions, and this applies to
all architectures. Note that for naked functions, 'unreachable' insn is
necessary in IR since the basic block needs a terminator to end.
This patch checked whether a function is naked function or not. If it is
a naked function, 'unreachable' insn will not generate ISD::TRAP.
[1] https://github.com/llvm/llvm-project/pull/131731
Co-authored-by: Yonghong Song <yonghong.song@linux.dev>
Most places that call Intrinsic::getAttributes() are only interested in
the function attributes, so add a separate function for that.
The motivation for this is that I'd like to add the ability to specify
range attributes on intrinsics, which requires knowing the function
type. This avoids needing to know the type for most attribute queries.
I initially thought starting with a more narrow definition and later
expanding would make more sense. But as pointed out in review for PR
#129220, this restriction is generating additional unnecessary work.
This patch alters the intrinsic to accept patterns of any type. Future
patches will update LoopIdiomRecognize and PreISelIntrinsicLowering to
take advantage of this. The verifier will complain if an unsized type is
used. I've additionally taken the opportunity to remove a comment from
the LangRef about some bit widths potentially not being supported by the
target. I don't think this is any more true than it is for arbitrary
width loads/stores which don't carry a similar warning that I can see.
A verifier check ensures that only sized types are used for the pattern.
This commit adds a function for creating fma intrinsic calls to the IRBuilder. If the "IsFPConstrained" flag of the builder is set,
the function creates a call to "experimental.constrained.fma" instead of "llvm.fma" .
To support the creation of the constrained intrinsic, a function "CreateConstrainedFPIntrinsic" is introduced.
This reverts commit 31ebe6647b7f1fc7f6778a5438175b12f82357ae.
The reason for the test failure is likely due to
`Name2PairMap::getTimerGroup(...)` not holding a lock.
Additionally, remove the behavior for both pass manager's timer manager
classes (`PassTimingInfo` for the old pass manager and
`TimePassesHandler` for the new pass manager) where these classes would
print the values of their timers upon destruction.
Currently, each pass manager manages their own `TimerGroup`s. This is
problematic because of duplicate `TimerGroup`s (both pass managers have
a `TimerGroup` for pass times with identical names and descriptions).
The result is that in Clang, `-ftime-report` has two "Pass execution
timing report" sections (one for the new pass manager which manages
optimization passes, and one for the old pass manager which manages the
backend). The result of this change is that Clang's `-ftime-report` now
prints both optimization and backend pass timing info in a unified "Pass
execution timing report" section.
Moving the ownership of the `TimerGroups` to globals also makes it
easier to implement JSON-formatted `-ftime-report`. This was not
possible with the old structure because the two pass managers were
created and destroyed in far parts of the codebase and outputting JSON
requires the printing logic to be at the same place because of
formatting.
Previous discourse discussion:
https://discourse.llvm.org/t/difficulties-with-implementing-json-formatted-ftime-report/84353
llvm.is.constant (and therefore Clang's __builtin_constant_p()) need to
report undefined values as non-constant or future DCE choices end up
making no sense. This was encountered while building the Linux kernel
which uses __builtin_constant_p() while trying to evaluate if it is safe
to use a compile-time constant resolution for string lengths or if it
must kick over to a full runtime call to strlen(). Obviously an
undefined variable cannot be known at compile-time, so
__builtin_constant_p() needs to return false. This change will also mean
that Clang will match GCC's behavior under the same conditions.
Fixes#130649
Fixes#129900
If `operator delete` was called after an unsuccessful constructor call
after `operator new`, we ran into undefined behaviour.
This was discovered by our malfunction tests while preparing an upgrade
to LLVM 20, that explicitly check for such kind of bugs.
Inspired by PR #127944, this patch adds an option to print profile metadata inline with respect to the instruction (or function) it annotates - this saves one time from having to search up and down large textual modules to find this info.
…argetMachine
RISC-V's data layout is determined by the ABI, not just the target
triple. However, the TargetMachine is created using the data layout from
the target triple, which is not always correct. This patch uses the
target ABI from the module and passes it to the TargetMachine, ensuring
that the data layout is set correctly according to the ABI.
The same problem will happen with other targets like MIPS, but
unfortunately, MIPS didn't emit the target-abi into the module flags, so
this patch only fixes the issue for RISC-V.
NOTE: MIPS with -mabi=n32 can trigger the same issue.
Another possible solution is add new parameter to the TargetMachine
constructor, but that would require changes in all the targets.
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.
For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.
The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
There was an assert in GetConstant checked if Bound is constant.
However, GetConstant was only called when IsConstant==true.
This refactor attempts to get rid of the assert by combining GetConstant
and IsContstant.
This patch adds a function attribute `riscv_vls_cc` for RISCV VLS
calling
convention which takes 0 or 1 argument, the argument is the `ABI_VLEN`
which is the `VLEN` for passing the fixed-vector arguments, it wraps the
argument as a scalable vector(VLA) using the `ABI_VLEN` and uses the
corresponding mechanism to handle it. The range of `ABI_VLEN` is [32,
65536],
if not specified, the default value is 128.
Here is an example of VLS argument passing:
Non-VLS call:
```
void original_call(__attribute__((vector_size(16))) int arg) {}
=>
define void @original_call(i128 noundef %arg) {
entry:
...
ret void
}
```
VLS call:
```
void __attribute__((riscv_vls_cc(256))) vls_call(__attribute__((vector_size(16))) int arg) {}
=>
define riscv_vls_cc void @vls_call(<vscale x 1 x i32> %arg) {
entry:
...
ret void
}
}
```
The first Non-VLS call passes generic vector argument of 16 bytes by
flattened integer.
On the contrary, the VLS call uses `ABI_VLEN=256` which wraps the
vector to <vscale x 1 x i32> where the number of scalable vector
elements
is calaulated by: `ORIG_ELTS * RVV_BITS_PER_BLOCK / ABI_VLEN`.
Note: ORIG_ELTS = Vector Size / Type Size = 128 / 32 = 4.
PsABI PR: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/418
C-API PR: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/68
Relative to the previous attempt this includes two fixes:
* Adjust callCapturesBefore() to not skip captures(ret: address,
provenance) arguments, as these will not count as a capture
at the call-site.
* When visiting uses during stack slot optimization, don't skip
the ModRef check for passthru captures. Calls can both modref
and be passthru for captures.
------
This extends CaptureTracking to support inferring non-trivial
CaptureInfos. The focus of this patch is to only support FunctionAttrs,
other users of CaptureTracking will be updated in followups.
The key API changes here are:
* DetermineUseCaptureKind() now returns a UseCaptureInfo where the UseCC
component specifies what is captured at that Use and the ResultCC
component specifies what may be captured via the return value of the
User. Usually only one or the other will be used (corresponding to
previous MAY_CAPTURE or PASSTHROUGH results), but both may be set for
call captures.
* The CaptureTracking::captures() extension point is passed this
UseCaptureInfo as well and then can decide what to do with it by
returning an Action, which is one of: Stop: stop traversal.
ContinueIgnoringReturn: continue traversal but don't follow the
instruction return value. Continue: continue traversal and follow the
instruction return value if it has additional CaptureComponents.
For now, this patch retains the (unsound) special logic for comparison
of null with a dereferenceable pointer. I'd like to switch key code to
take advantage of address/address_is_null before dropping it.
This PR mainly intends to introduce necessary API changes and basic
inference support, there are various possible improvements marked with
TODOs.
Replace some more nvvm.annotations with function attributes,
auto-upgrading the annotations as needed. These new attributes will be
more idiomatic and compile-time efficient than the annotations.
- !"maxntid[xyz]" -> "nvvm.maxntid"
- !"reqntid[xyz]" -> "nvvm.reqntid"
- !"cluster_dim_[xyz]" -> "nvvm.cluster_dim"
`llvm.wasm.throw` intrinsic can throw but it was not invokable. Not sure
what the rationale was when it was first written that way, but I think
at least in Emscripten's C++ exception support with the Wasm port of
libunwind, `__builtin_wasm_throw`, which is lowered down to
`llvm.wasm.rethrow`, is used only within `_Unwind_RaiseException`, which
is an one-liner and thus does not need an `invoke`:
720e97f76d/system/lib/libunwind/src/Unwind-wasm.c (L69)
(`_Unwind_RaiseException` is called by `__cxa_throw`, which is generated
by the `throw` C++ keyword)
But this does not address other direct uses of the builtin in C++, whose
use I'm not sure about but is not prohibited. Also other language
frontends may need to use the builtin in different functions, which has
`try`-`catch`es or destructors.
This makes `llvm.wasm.throw` invokable in the backend. To do that, this
adds a custom lowering routine to `SelectionDAGBuilder::visitInvoke`,
like we did for `llvm.wasm.rethrow`.
This does not generate `invoke`s for `__builtin_wasm_throw` yet, which
will be done by a follow-up PR.
Addresses #124710.
The llvm.fake.use intrinsic is used to prevent certain values from being
optimized out for the benefit of debug info; it is not, however, a debug
or pseudo instruction itself and necessarily must not be treated as one,
since its purpose is to act like a normal instruction. In the original
commit that added them, the IR intrinsic however was treated as one in
`getPrevNonDebugInstruction` (but _not_ in `getNextNonDebugInstruction`,
or in the MIR equivalents). This patch correctly treats it as a
non-debug instruction.
An Ada program can have types that are subranges of other types. This
patch adds a new DIType node, DISubrangeType, to represent this concept.
I considered extending the existing DISubrange to do this, but as
DISubrange does not derive from DIType, that approach seemed more
disruptive.
A DISubrangeType can be used both as an ordinary type, but also as the
type of an array index. This is also important for Ada.
Ada subrange types can also be stored using a bias. Representing this in
the DWARF required the use of an extension. GCC has been emitting this
extension for years, so I've reused it here.
Check that the input register is an inreg argument to the parent
function. (See the comment in `IntrinsicsAMDGPU.td`.)
This LLVM defect was identified via the AMD Fuzzing project.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>