Placing the class id at offset 0 should make `isa` and `dyn_cast` faster
by eliminating the field offset (previously 0x10) from the memory
operand, saving encoding space on x86, and, in theory, an add micro-op.
You can see the load encodes one byte smaller here:
https://godbolt.org/z/Whvz4can9
The compile time tracker shows some modestly positive results in the
on the `cycle` metric and in the final clang binary size metric:
https://llvm-compile-time-tracker.com/compare.php?from=33b54f01fe32030ff60d661a7a951e33360f82ee&to=2530347a57401744293c54f92f9781fbdae3d8c2&stat=cycles
Clicking through to the per-library size breakdown shows that
instcombine size reduces by 0.68%, which is meaningful, and I believe
instcombine is known to be a hotspot.
It is, however, potentially noise. I still think we should do this,
because notionally, the class id really acts as the vptr of the Value,
and conventionally the vptr is always at offset 0.
- [DebugMetadata][DwarfDebug] Support function-local types in lexical
block scopes (4/7)
- [CloneFunction][DebugInfo] Avoid cloning DILocalVariables of inlined
functions
This is a follow-up for https://reviews.llvm.org/D144006, fixing a crash
reported
in Chromium (https://reviews.llvm.org/D144006#4651955).
The first commit is added for convenience, as it has already been
accepted.
If DISubpogram was not cloned (e.g. we are cloning a function that has
other
functions inlined into it, and subprograms of the inlined functions are
not supposed to be cloned), it doesn't make sense to clone its
DILocalVariables as well.
Otherwise get duplicated DILocalVariables not tracked in their
subprogram's retainedNodes, that crash LTO with Chromium.
This is meant to be committed along with
https://reviews.llvm.org/D144006.
This patch adds an intrinsic for setmaxnreg PTX instruction.
* PTX Doc link for this instruction:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-setmaxnreg
* The i32 argument, an immediate value, specifies the actual
absolute register count for the instruction.
* The `setmaxnreg` instruction is available in SM90a.
So, this patch adds 'hasSM90a' predicate to use in
the NVPTX backend.
* lit tests are added to verify the lowering of the intrinsic.
* Verifier logic (and tests) are added to test the register
count range and divisibility-by-8 requirements.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Currently, the AsmWriter can print DPValues, but does not consider them
when creating slots for metadata, which can result in erroneous output
where metadata is numbered incorrectly. This patch modifies the
ModuleSlotTracker to correctly track slots for metadata that appears in
DPValues.
Vectors are always bit-packed and don't respect the elements' alignment
requirements. This is different from arrays. This means offsets of
vector GEPs need to be computed differently than offsets of array GEPs.
This PR fixes many places that rely on an incorrect pattern
that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`.
We replace these by usages of `GTI.getSequentialElementStride(DL)`,
which is a new helper function added in this PR.
This changes behavior for GEPs into vectors with element types for which
the (bit) size and alloc size is different. This includes two cases:
* Types with a bit size that is not a multiple of a byte, e.g. i1.
GEPs into such vectors are questionable to begin with, as some elements
are not even addressable.
* Overaligned types, e.g. i16 with 32-bit alignment.
Existing tests are unaffected, but a miscompilation of a new test is fixed.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
Temporarily fix for issue #76545
Hwasan does not attach tags to @llvm.dbg.assign. It's not clear if we
can attach tags to @llvm.dbg.assign.
For now we just disable the path replacing llvm.dbg.declare with
llvm.dbg.assign.
It may reduce the quality of interactive debugging with HWASAN, but
usually it's
a smaller priority for sanitizers than the quality if reports.
In LLVMContext::diagnose, set `HasErrors` for `DS_Error` so that all
derived `DiagnosticHandler` have correct `HasErrors` information.
An alternative is to set `HasErrors` in
`DiagnosticHandler::handleDiagnostics`, but all derived
`handleDiagnostics` would have to call the base function.
Simplify the compiler-rt test to make it more general for different
platforms, and use `*DAG` matchers for lines that may be emitted
out-of-order.
- The compiler-rt test passed on a Windows machine. Previously name
matchers don't work for MSVC mangling
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907)
- `*DAG` matchers fixed the error in
https://lab.llvm.org/buildbot/#/builders/94/builds/17924
This is the second reland and fixed errors caught in first reland
(https://github.com/llvm/llvm-project/pull/75860)
**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.
To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.
*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
Fixed build-bot failures caught by post-submit tests
1) Add the list of command line tools needed by new compiler-rt test into dependency.
2) Use `starts_with` to replace deprecated `startswith`.
**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.
To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.
*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.
To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.
*`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
The specialisation will not be valid when ConstantInt gains native
support for vector types.
This is largely a mechanical change but with extra attention paid to constant
folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to
remove the need to call `getIntegerType()`.
Co-authored-by: Nikita Popov <github@npopov.com>
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
This Op<2> usage was missed in 1ee6ec2bf3, which replaced the third
shuffle operand with a vector of integer mask constants.
I noticed this when attempting to make changes to the layout of
llvm::Value.
This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.
We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
... by lowering them as lazy resolve-on-first-use symbol resolvers. Note that this is subtly different timing than on ELF platforms, where ifunc resolution happens at load time.
Since ld64 and ld-prime don't support all the cases we need for these, we lower them manually in the AsmPrinter.
Add the `dead_on_unwind` attribute, which states that the caller will
not read from this argument if the call unwinds. This allows eliding
stores that could otherwise be visible on the unwind path, for example:
```
declare void @may_unwind()
define void @src(ptr noalias dead_on_unwind %out) {
store i32 0, ptr %out
call void @may_unwind()
store i32 1, ptr %out
ret void
}
define void @tgt(ptr noalias dead_on_unwind %out) {
call void @may_unwind()
store i32 1, ptr %out
ret void
}
```
The optimization is not valid without `dead_on_unwind`, because the `i32
0` value might be read if `@may_unwind` unwinds.
This attribute is primarily intended to be used on sret arguments. In
fact, I previously wanted to change the semantics of sret to include
this "no read after unwind" property (see D116998), but based on the
feedback there it is better to keep these attributes orthogonal (sret is
an ABI attribute, dead_on_unwind is an optimization attribute). This is
a reboot of that change with a separate attribute.
As part of the RemoveDIs project, transitioning to non-instruction debug
info, all debug intrinsic handling code needs to be duplicated to handle
DPValues.
--try-experimental-debuginfo-iterators enables the new debug mode in
tests if the CMake option has been enabled.
`getInsertPtAfterFramePtr` now returns an iterator so we don't lose
debug-info-communicating bits.
---
Depends on #73500, #74090, #74091.
Note that all the patches that implement support for declare-style
DPValues have tests that are "rotten green" test without this
patch (i.e., they pass at the moment without testing what we
want them to test). See the Pull Request for more detail on this.
These flags are usable on floating point arithmetic, as well as call,
select, and phi instructions whose resulting type is floating point, or
a vector of, or an array of, a valid type. Whether or not the flags are
valid for a given instruction can be checked with the new
LLVMCanValueUseFastMathFlags function.
These are exposed using a new LLVMFastMathFlags type, which is an alias
for unsigned. An anonymous enum defines the bit values for it.
Tests are added in echo.ll for select/phil/call, and the floating point
types in the new float_ops.ll bindings test.
Select and the floating point arithmetic instructions were not
implemented in llvm-c-test/echo.cpp, so they were added as well.
This simplifies an upcoming patch to support the RemoveDIs project (tracking
variable locations without using intrinsics).
Next in this series is #73500.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
Added the following functions for manipulating operand bundles, as well as
building ``call`` and ``invoke`` instructions that use operand bundles:
* LLVMBuildCallWithOperandBundles
* LLVMBuildInvokeWithOperandBundles
* LLVMCreateOperandBundle
* LLVMDisposeOperandBundle
* LLVMGetNumOperandBundles
* LLVMGetOperandBundleAtIndex
* LLVMGetNumOperandBundleArgs
* LLVMGetOperandBundleArgAtIndex
* LLVMGetOperandBundleTag
Fixes#71873.
With opaque pointers, the address spaces will only be the same if
the types are the same, in which case this would have been handled
at the start of the method already.
On platforms where char is signed, the ">> 4" shift will produce
incorrect results. We were already working on unsigned char for
most characters, but not for the first one.
Fixes https://github.com/llvm/llvm-project/issues/74732.
This adds support for a HasTailCall flag on function call edges in the
ThinLTO summary. It is intended for use in aiding discovery of missing
frames from tail calls in profiled call stacks for MemProf of profiled
binaries that did not disable tail call elimination. A follow on change
will add the use of this new flag during MemProf context disambiguation.
The new flag is encoded in the bitcode along with either the hotness
flag from the profile, or the relative block frequency under the
-write-relbf-to-summary flag when there is no profile data.
Because we now will always have some additional call edge information, I
have removed the non-profile function summary record format, and we
simply encode the tail call flag along with a hotness type of none when
there is no profile information or relative block frequency. The change
of record format and name caused most of the test case changes.
I have added explicit testing of generation of the new tail call flag
into the bitcode and IR assembly format as part of the changes to
llvm/test/Bitcode/thinlto-function-summary-refgraph.ll. I have also
added round trip testing through assembly and bitcode to
llvm/test/Assembler/thinlto-summary.ll.
Some final errors have turned up when doing stage2clang builds:
* We can insert before end(), which won't have an attached DPMarker,
thus we can have a nullptr DPMarker in Instruction::insertBefore. Add a
null check there.
* We need to use the iterator-returning form of getFirstNonPHI to ensure
we don't insert any PHIs behind debug-info at the start of a block.
In the debug-info-splice implementation, we need to be careful to delete
trailing DPMarkers from blocks when we splice their contents out. This
is equivalent to removing the terminator from a block, then splicing the
rest of it's contents to another block: any DPValues trailing at the end
of the block get moved and we need to clean up afterwards.
We occasionally move instructions to their own positions, which is not
an error, and has no effect on the program. However, if dbg.value
intrinsics are present then they can effectively be moved without any
other effect on the program.
This is a problem if we're using non-instruction debug-info: a moveAfter
that appears to be a no-op in RemoveDIs mode can jump in front of
intrinsics in dbg.value mode. Thus: if an instruction is moved to itself
and the head bit is set, force attached debug-info to shift down one
instruction, which replicates the dbg.value behaviour.
The order of dbg.value intrinsics appearing in the output can affect the
order of tables in DWARF sections. This means that DPValues, our
dbg.value replacement, needs to obey the same ordering rules. For
dbg.values returned by findDbgUsers it's reverse order of creation (due
to how they're put on use-lists). Produce that order from findDbgUsers
for DPValues.
I've got a few IR files where the order of dbg.values flips, but it's a
fragile test -- ultimately it needs the number of times a DPValue is
handled by findDbgValues to be odd.
There is support for intrinsics in Instruction::isCommunative, but there
is no equivalent implementation for isAssociative. This patch builds
support for associative intrinsics with TRE as an application. TRE can
now have associative intrinsics as an accumulator. For example:
```
struct Node {
Node *next;
unsigned val;
}
unsigned maxval(struct Node *n) {
if (!n) return 0;
return std::max(n->val, maxval(n->next));
}
```
Can be transformed into:
```
unsigned maxval(struct Node *n) {
struct Node *head = n;
unsigned max = 0; // Identity of unsigned std::max
while (true) {
if (!head) return max;
max = std::max(max, head->val);
head = head->next;
}
return max;
}
```
This example results in about 5x speedup in local runs.
We conservatively only consider min/max and as associative for this
patch to limit testing scope. There are probably other intrinsics that
could be considered associative. There are a few consumers of
isAssociative() that could be impacted. Testing has only required to
Reassociate pass be updated.
We can use Intrinsic::getDeclaration() here, we just have to pass
the correct arguments. This function accepts only the mangled types,
not all argument types.