This work feeds part of PR
https://github.com/llvm/llvm-project/pull/88385, and adds support for
vectorising
loops with uncountable early exits and outside users of loop-defined
variables. When calculating the final value from an uncountable early
exit we need to calculate the vector lane that triggered the exit,
and hence determine the value at the point we exited.
All code for calculating the last value when exiting the loop early
now lives in a new vector.early.exit block, which sits between the
middle.split block and the original exit block. Doing this required
two fixes:
1. The vplan verifier incorrectly assumed that the block containing
a definition always dominates the block of the user. That's not true
if you can arrive at the use block from multiple incoming blocks.
This is possible for early exit loops where both the early exit and
the latch jump to the same block.
2. We were adding the new vector.early.exit to the wrong parent loop.
It needs to have the same parent as the actual early exit block from
the original loop.
I've added a new ExtractFirstActive VPInstruction that extracts the
first active lane of a vector, i.e. the lane of the vector predicate
that triggered the exit.
NOTE: The IR generated for dealing with live-outs from early exit
loops is unoptimised, as opposed to normal loops. This inevitably
leads to poor quality code, but this can be fixed up later.
This patch implements an LLVM IR pass, named kernel-info, that reports
various statistics for codes compiled for GPUs. The ultimate goal of
these statistics to help identify bad code patterns and ways to mitigate
them. The pass operates at the LLVM IR level so that it can, in theory,
support any LLVM-based compiler for programming languages supporting
GPUs. It has been tested so far with LLVM IR generated by Clang for
OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs.
By default, the pass runs at the end of LTO, and options like
``-Rpass=kernel-info`` enable its remarks. Example `opt` and `clang`
command lines appear in `llvm/docs/KernelInfo.rst`. Remarks include
summary statistics (e.g., total size of static allocas) and individual
occurrences (e.g., source location of each alloca). Examples of its
output appear in tests in `llvm/test/Analysis/KernelInfo`.
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
[PassBuilder] Add RelLookupTableConverterPass to LTO
This patch adds RelLookupTableConverterPass into the LTO
post-link optimization pass pipeline. This optimization
converts lookup tables to relative lookup tables to make
them PIC-friendly, which is already included in the non-LTO
pass pipeline. This patch adds this optimization to the
post-link optimization pipeline to discover more
opportunities in the LTO context.
Replace the "what I need to do" section of the RemoveDIs docs with a
paragraph about preserving start-of-block iterators. Hopefully this is
concise enough to remain in peoples heads going forwards!
This makes it more clear what you the author must do, and what reviewers
can expect you to do, before an approved PR can be merged. Spliting out
the email bit into a section also means we can link directly to it in
discussions.
This relies on one of those parties actually reading this, but I plan to
tackle the case where they don't with some new automation.
As someone on Discord was understandably confused because the build
directory does contain folder structures that look remarkably like the
source directory.
I used this page to explain it but realised that this must be from when
llvm was a separate repository. So `<user home>/llvm` probably was a
common path.
Now it's in llvm-project. So make that obvious in the instructions.
This fixes "unused-local-typedef" warnings in 9324e6a7a5.
This adds an option `--remove-note=[name/]type` to selectively delete
notes in ELF files, where `type` is the numeric value of the note type
and `name` is the name of the originator. The name can be omitted, in
which case all notes of the specified type will be removed. For now,
only `SHT_NOTE` sections that are not associated with segments are
handled. The implementation can be extended later as needed.
RFC: https://discourse.llvm.org/t/rfc-llvm-objcopy-feature-for-editing-notes/83491
This adds an option `--remove-note=[name/]type` to selectively delete
notes in ELF files, where `type` is the numeric value of the note type
and `name` is the name of the originator. The name can be omitted, in
which case all notes of the specified type will be removed. For now,
only `SHT_NOTE` sections that are not associated with segments are
handled. The implementation can be extended later as needed.
RFC: https://discourse.llvm.org/t/rfc-llvm-objcopy-feature-for-editing-notes/83491
By noting where the files are to be found, and adding some
whitespace to break up large blocks.
(the merge on behalf bit needs a refresh but this will go
into review later after this)
.. by changing the signal stop reason format 🤦
The reason this did not work is because the code in
`StopInfo::GetCrashingDereference` was looking for the string "address="
to extract the address of the crash. Macos stop reason strings have the
form
```
EXC_BAD_ACCESS (code=1, address=0xdead)
```
while on linux they look like:
```
signal SIGSEGV: address not mapped to object (fault address: 0xdead)
```
Extracting the address from a string sounds like a bad idea, but I
suppose there's some value in using a consistent format across
platforms, so this patch changes the signal format to use the equals
sign as well. All of the diagnose tests pass except one, which appears
to fail due to something similar #115453 (disassembler reports
unrelocated call targets).
I've left the tests disabled on windows, as the stop reason reporting
code works very differently there, and I suspect it won't work out of
the box. If I'm wrong -- the XFAIL will let us know.
This patch adds NVVM intrinsics and NVPTX codegen for:
- cp.async.bulk.prefetch.L2.* variants
- These intrinsics optionally support cache_hints as indicated by the
boolean flag argument.
- Lit tests are added for all combinations of these intrinsics in
cp-async-bulk.ll.
- The generated PTX is verified with a 12.3 ptxas executable.
- Added docs for these intrinsics in NVPTXUsage.rst file.
PTX Spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk-prefetch
Co-authored-by: abmajumder <abmajumder@nvidia.com>
This extension adds eight 48 bit load store instructions.
The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/latest
This patch adds assembler only support.
---------
Co-authored-by: Harsh Chandel <hchandel@qti.qualcomm.com>
This statement is somewhat confusing when paired with the later
statement that says "Branching on an undefined value is undefined
behavior". Furthermore, this example does not show any conditional
branches, so this comment seems to be outdated.
See issue #122532 for more details.
This updates the developer policy to align with established community
norms for copyright notices in source code contributed to LLVM.
The updates clearly state that we do not accept code contianing explicit
copyright notices in source except where such code is a pre-existing
part of an external dependency that is being vendored into LLVM.
Explicit copyright notices in source add no value to the project since
copyright ownership is well tracked through git. Our policy already
requires that contributions made not by the original author have
appropriate attribution in git commit messsages or metadata.
Further, explicit copyright notices in code can easily get out of date
as the code is refactored, updated by additional authors or otherwise
changed over time. This leads to misleading out-of-date copyright
notices which do more harm than good.
This change should be viewed as a clarification and statement of
existing established policy, not a change in policy since it represents
the way the project has been operating.
This commit promotes the SPIR-V backend from experimental to official
status. As a result, SPIR-V will be built by default, simplifying
integration and increasing accessibility for downstream projects.
Discussion and RFC on Discourse:
https://discourse.llvm.org/t/rfc-promoting-spir-v-to-an-official-target/83614
This relands f8f8598fd886cddfd374fa43eb6d7d37d301b576
Follow up on #122371:
The problem here is a little subtle: when we dry-run the measurement
phase, we create a LLJIT instance without actually executing the
snippets. The key is, LLJIT has its own TargetMachine which uses triple
designated by LLVM_TARGET_ARCH (which is default to host). On a machine
that does not support Exegesis, the LLJIT would fail to create its
TargetMachine because llvm-exegesis don't even register the host's
target!
Putting this test into any of the target-specific folder won't help,
because it's about the host. And personally I don't really want to use
`exegesis-can-execute-<arch>` for generic tests like this -- it's too
strict as we don't actually need to execute the snippet.
My solution here is creating another test feature which is added only
when LLVM_TARGET_ARCH is supported by llvm-exegesis. This feature is
something in between `<arch>-registered-target` and
`exegesis-can-execute-<arch>`.