This relands f8f8598fd886cddfd374fa43eb6d7d37d301b576
Follow up on #122371:
The problem here is a little subtle: when we dry-run the measurement
phase, we create a LLJIT instance without actually executing the
snippets. The key is, LLJIT has its own TargetMachine which uses triple
designated by LLVM_TARGET_ARCH (which is default to host). On a machine
that does not support Exegesis, the LLJIT would fail to create its
TargetMachine because llvm-exegesis don't even register the host's
target!
Putting this test into any of the target-specific folder won't help,
because it's about the host. And personally I don't really want to use
`exegesis-can-execute-<arch>` for generic tests like this -- it's too
strict as we don't actually need to execute the snippet.
My solution here is creating another test feature which is added only
when LLVM_TARGET_ARCH is supported by llvm-exegesis. This feature is
something in between `<arch>-registered-target` and
`exegesis-can-execute-<arch>`.
This introduces the `captures` attribute as described in:
https://discourse.llvm.org/t/rfc-improvements-to-capture-tracking/81420
This initial patch only introduces the IR/bitcode support for the
attribute and its in-memory representation as `CaptureInfo`. This will
be followed by a patch to upgrade and remove the `nocapture` attribute,
and then by actual inference/analysis support.
Based on the RFC feedback, I've used a syntax similar to the `memory`
attribute, though the only "location" that can be specified is `ret`.
I've added some pretty extensive documentation to LangRef on the
semantics. One non-obvious bit here is that using ptrtoint will not
result in a "return-only" capture, even if the ptrtoint result is only
used in the return value. Without this requirement we wouldn't be able
to continue ordinary capture analysis on the return value.
This extension adds eleven instructions to accelerate interrupt
servicing.
The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/latest
This patch adds assembler only support.
---------
Co-authored-by: Harsh Chandel <hchandel@qti.qualcomm.com>
Add a prologue to the kernel entry to handle cases where code designed
for kernarg preloading is executed on hardware equipped with
incompatible firmware. If hardware has compatible firmware the 256 bytes
at the start of the kernel entry will be skipped. This skipping is done
automatically by hardware that supports the feature.
A pass is added which is intended to be run at the very end of the
pipeline to avoid any optimizations that would assume the prologue is a
real predecessor block to the actual code start. In reality we have two
possible entry points for the function. 1. The optimized path that
supports kernarg preloading which begins at an offset of 256 bytes. 2.
The backwards compatible entry point which starts at offset 0.
PR #96083 added intrinsics for async copy of 'tensor' data
using TMA. Following a similar design, this PR adds intrinsics
for async copy of bulk data (non-tensor variants) through TMA.
* These intrinsics optionally support multicast and cache_hints,
as indicated by the boolean arguments at the end of the intrinsics.
* The backend looks through these flag arguments and lowers to the
appropriate PTX instructions.
* Lit tests are added for all combinations of these intrinsics in
cp-async-bulk.ll.
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst file.
PTX Spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Fixes issue #122084.
Under "Arguments" in the 'call' instruction section, there was some text
included in the code segment so I edited it out. Also fixed the
numbering issue in that section.
This replaces the existing `-wasm-enable-exnref` with
`-wasm-use-legacy-eh` option, in an effort to make the new standardized
exnref proposal the 'default' state and the legacy proposal needs to be
separately enabled an option. But given that most users haven't switched
to the new proposal and major web browsers haven't turned it on by
default, this `-wasm-use-legacy-eh` is turned on by default, so nothing
will change for now for the functionality perspective.
This also removes the restriction that `-wasm-enable-exnref` be only
used with `-wasm-enable-eh` because this option is enabled by default.
This option does not have any effect when `-wasm-enable-eh` is not used.
…#121991)"
This reverts commit f8f8598fd886cddfd374fa43eb6d7d37d301b576.
This breaks ARMv7 and s390x buildbot with the following message:
```
llvm-exegesis error: No available targets are compatible with triple "armv8l-unknown-linux-gnueabihf"
FileCheck error: '<stdin>' is empty.
FileCheck command line: /home/tcwg-buildbot/worker/clang-armv7-2stage/stage2/bin/FileCheck /home/tcwg-buildbot/worker/clang-armv7-2stage/llvm/llvm/test/tools/llvm-exegesis/dry-run-measurement.test
```
With the new benchmark phase, `dry-run-measurement`, llvm-exegesis can
run everything except the actual snippet execution. It is useful when we
want to test some parts of the code between the `assemble-measured-code`
and `measure` phase without actually running on native platforms.
The LLVM Discord bot now has the ability to scrape the LLVM calendar &
send reminders about upcoming office hours events and sync-ups. Document
that here.
While I'm in the area, add a note about the bot's ability to @mention
people when they're on buildbot blamelists.
Related to llvm/Community.o#19
This introduces `@llvm.dx.resource.load.rawbuffer` and generalizes the
buffer load docs under DirectX/DXILResources.
This resolves the "load" parts of #106188
We'd introduced separate versions of `llvm.dx.resource.load` with a
struct return to handle the CheckAccessFullyMapped case without making
the IR for the common case unnecessarily complicated. However, at this
point the common case is really `resource.getpointer`, so the ergonomics
of a simplified version of `load` don't actually gain us as much as the
cost of having multiple opcodes.
Drop the `dx.resource.loadchecked` functions and have `dx.resource.load`
consistently return `{element_type, i1}`.
Since someone on Discord asked why macOS arm64 was not listed.
https://llvm.org/docs/GettingStarted.html#hardware
Add a few known platforms:
* Linux AArch64
* FreeBSD AArch64
* macOS arm64 (Clang build only, there might be a GCC port but I've not
used it myself)
* Windows on Arm (ARM64 as Microsoft refers to it)
The Qualcomm uC Xqcicm extension adds 13 conditional move instructions.
The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/latest
This patch adds assembler only support.
---------
Co-authored-by: Harsh Chandel <hchandel@qti.qualcomm.com>
This extension adds 3 instructions that perform load-store address
calculation.
The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/latest
This patch adds assembler only support.
---------
Co-authored-by: Harsh Chandel <hchandel@qti.qualcomm.com>
Co-authored-by: Sudharsan Veeravalli <quic_svs@quicinc.com>
This patch adds documentation on the benchmark-process-cpu option. I
apparently did not add any documentation when originally implementing
the feature.
This patch adds support of the following llvm-objcopy flags for MachO:
- `--globalize-symbol`, `--globalize-symbols`,
- `--keep-global-symbol`, `-G`, `--keep-global-symbols`,
- `--localize-symbol`, `-L`, `--localize-symbols`,
- `--skip-symbol`, `--skip-symbols`.
Code in `updateAndRemoveSymbols` for MachO
is kept similar to its version for ELF.
Fixes#120894
This reverts commit 2ec6174bef4bc9ef3d5cedbffd7169017c9669c3.
New changes:
- Use explicit overloads of write(<int types>)
- Fix link error due to missing dependency (lib/Support)
- Updated tests and docs
Objective:
- Provide a common framework in LLVM for collecting various usage
metrics
- Characteristics:
- Extensible and configurable by:
- tools in LLVM that want to use it
- vendors in their downstream codebase
- tools users (as allowed by vendor)
Background:
The framework was originally proposed only for LLDB, but there were
quite a few requests to move it to llvm/lib given telemetry
is a common use case in a lot of tools, not just LLDB.
See more details on the design and discussions here on the RFC:
https://discourse.llvm.org/t/rfc-lldb-telemetry-metrics/64588/20?u=oontvoo
---------
Co-authored-by: Alina Sbirlea <alina.g.simion@gmail.com>
Co-authored-by: James Henderson <James.Henderson@sony.com>
Co-authored-by: Pavel Labath <pavel@labath.sk>
```
$ ./bin/clang /tmp/test.c -o /tmp/test.o -target aarch64-linux -march=armv8+memtag -fsanitize=memtag-stack
clang: error: unsupported option '-fsanitize=memtag*' for target 'aarch64-unknown-linux'
```
But this works:
```
$ ./bin/clang /tmp/test.c -o /tmp/test.o --target=aarch64-linux-android -march=armv8+memtag -fsanitize=memtag-stack
```
Due to this check in Clang:
2210da3b82/clang/lib/Driver/ToolChains/CommonArgs.cpp (L1651)
Likely because the required notes and dynamic loader support only exist
for Android.
You can get around this, sort of, by not linking the file. However this
means you have to provide your own way of loading it, so it doesn't
change the statement that this feature is Android only.
https://github.com/llvm/llvm-project/issues/64692 also confirms that the
intent is to only support Android at this time.
And while I'm here, suggest an additive set of flags that can also be
used.
ByVal arguments and Swifterror require special handling in the coroutine
passes. The goal of this section is to provide a description of how
these parameter attributes are handled.
Remove the 6th bullet point "Strive to improve security over time, for
example by adding additional testing, fuzzing and hardening after fixing
issues."
At the security group meeting on 2024-11-19 we discussed the role the
security group was performing in practice. We are in effect acting as a
security response group, dealing with issues raised via the process
given in the LLVM Security group page. We are not proactively adding
additional testing fuzzing and hardening. While this could be considered
an aspirational goal, it may give the implication that the LLVM Security
Group is handling or at worst guaranteeing security for the LLVM project
when in practice it is not.
Meeting notes:
https://discourse.llvm.org/t/llvm-security-group-public-sync-ups/62735/32
Rename LLVM Security Group to LLVM Security Response Group. Take the
opportunity to canonicalise security group and Security Group to LLVM
Security Response Group.
At the 2024-11-19 LLVM Security Group meeting [1] we discussed that in
practice the LLVM Security Group was performing an incident response
role, but it was not proactively adding additional testing, fuzzing and
hardening. We do not want projects that use LLVM to see the LLVM
Security Group as guaranteeing security for LLVM.
We decided that it would be useful to rename the group to LLVM Security
Response Group as that reflects the work that it is doing.
There may be a case for a proactive security group with a different
remit, but this is out of scope of this commit.
[1]
https://discourse.llvm.org/t/llvm-security-group-public-sync-ups/62735/32
There are cases (like in an upcoming patch to MLIR's `Property` class)
where the ? value is a useful null value. However, existing predicates
make ti difficult to test if the value in a record one is operating is ?
or not.
This commit adds the !initialized predicate, which is 1 on concrete,
non-? values and 0 on ?.
---------
Co-authored-by: Akshat Oke <Akshat.Oke@amd.com>
commit a9aff440d9dd ("[libc][docs] reorganize documentation (#118836)")
moved https://libc.llvm.org/math/index.html to
https://libc.llvm.org/headers/math/index.html which makes links from
various slide decks stale.
There's an extension for sphinx that can generate redirects. Add a dependency
on that, then use it to create a redirect so that those older links still work.
I was able to install this sphinx extension via:
$ sudo apt install python3-sphinx-reredirects
We may need to install this on whatever server generates the llvm
documentation.
`--disassemble`/`--cdis` parses input bytes as decimal, 0bbin, 0ooct, or
0xhex. While the hexadecimal digit form is most commonly used, requiring
a 0x prefix for each byte (`0x48 0x29 0xc3`) is cumbersome.
Tools like xxd -p and rz-asm use a plain hex dump form without the 0x
prefix or space separator. This patch adds --hex to disassemble such hex
bytes with optional whitespace.
```
% rz-asm -a x86 -b 64 -d 4829c34829c4
sub rbx, rax
sub rsp, rax
% llvm-mc -triple=x86_64 --cdis --hex --output-asm-variant=1 <<< 4829c34829c4
.text
sub rbx, rax
sub rsp, rax
```
Pull Request: https://github.com/llvm/llvm-project/pull/119992
This PR adds the following features:
* saturation and float rounding mode decorations,
* arithmetic constrained floating-point intrinsics (strict_fadd,
strict_fsub, strict_fmul, strict_fdiv, strict_frem, strict_fma and
strict_fldexp),
* and SPV_INTEL_float_controls2 extension,
* using recent improvements of emit-intrinsics step, this PR also
simplifies pre- and post-legalizer steps and improves instruction
selection.
The P8700 is a high-performance processor from MIPS designed to meet the
demands of modern workloads, offering exceptional scalability and
efficiency. It builds on MIPS's established architectural strengths
while introducing enhancements that set it apart. For more details, you
can check out the official product page here:
https://mips.com/products/hardware/p8700/.
Scheduling model will be added in a separate commit/PR.