This commit moves the sign builtin's implementation to the CLC library.
It simultaneously optimizes it (for vector types) by removing
control-flow from the implementation.
The __CLC_INTERNAL preprocessor definition has been repurposed (without
the leading underscores) to be passed when building the internal CLC
library. It was only used in one other place to guard an extra maths
preprocessor definition, which we can do unconditionally.
Replace UniformQuantizedType by the more generic QuantizedType in Conv verifiers.
Change-Id: Ie1961af931864f801914a62976bc988881ee075e
Signed-off-by: Tai Ly <tai.ly@arm.com>
Co-authored-by: Thibaut Goetghebuer-Planchon <thibaut.goetghebuer-planchon@arm.com>
This patch makes these functions' tests work in big endian mode:
- `__aeabi_idivmod`.
- `__aeabi_uidivmod`.
- `__aeabi_uldivmod`.
The three functions return a struct containing two fields, quotient and
remainder, via *value in regs* calling convention. They differ in the
integer type of each field.
In the tests of the first two, a 64-bit integer is used as the return
type of the call. And as consequence of the ABI rules for structs
(Composite Types), the quotient resides in `r0` and the remainder in
`r1` regardless of endianness. So, in order to access each component
from the 64-bit integer in the caller code, care must be taken to access
the correct bits as they do depend on endianness in this case.
In the test of the third one, the caller code has inline assembly to
access the components. This assembly code assumed little endian, so it
had to be made flexible for big endian as well.
`_YUGA_BIG_ENDIAN` is defined in `int_endianness.h`. It's a macro
internal to compiler-rt that's in theory compatible with more toolchains
than gcc and clang.
Refactors XeGPU scatter attribute introducing following:
- improved docs formatting
- default initialized parameters
- invariant checks in attribute verifier
- removal of additional parsing error
The attribute's getters now provide default values simplifying their
usage and scattered tensor descriptor handling.
Related descriptor verifier is updated to avoid check duplication.
This adds the `llvm.sincospi` intrinsic, legalization, and lowering
(mostly reusing the lowering for sincos and frexp).
The `llvm.sincospi` intrinsic takes a floating-point value and returns
both the sine and cosine of the value multiplied by pi. It computes the
result more accurately than the naive approach of doing the
multiplication ahead of time, especially for large input values.
```
declare { float, float } @llvm.sincospi.f32(float %Val)
declare { double, double } @llvm.sincospi.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.sincospi.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.sincospi.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.sincospi.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.sincospi.v4f32(<4 x float> %Val)
```
Currently, the default lowering of this intrinsic relies on the
`sincospi[f|l]` functions being available in the target's runtime (e.g.
libc).
\[NVPTX\] Add Prefetch intrinsics
This PR adds prefetch intrinsics with the relevant eviction priorities.
* Lit tests are added as part of prefetch.ll
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst.
For more information, refer PTX ISA
`<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-prefetch-prefetchu>`_.
---------
Co-authored-by: abmajumder <abmajumder@nvidia.com>
Update llvm.call/llvm.invoke pretty printer/parser and the llvm ir import/export
to deal with the argument and result attributes.
This patch is made on top of PR 123176 that modified the
CallOpInterface and added the argument and result attributes to
llvm.call and llvm.invoke without doing anything with them.
RFC: https://discourse.llvm.org/t/mlir-rfc-adding-argument-and-result-attributes-to-llvm-call/84107
…p file
The `SanitizerCommon.ReportFile` test leaves a temp file behind on every
run. While this is not a problem for manual builds, on buildbots those
files accumulate over time, interfering with other bots on the same
system.
The files in question are named like
`sanitizer_common.reportfile.tmp.XXXXXX.<pid>`. The issue can be seen in
Solaris `truss` output:
```
22633: fstatat64(AT_FDCWD, "/tmp/sanitizer_common.reportfile.tmp.rzVEja", 0xFEFFBAD0, AT_SYMLINK_NOFOLLOW) Err#2 ENOENT
22633: openat64(AT_FDCWD, "/tmp/sanitizer_common.reportfile.tmp.rzVEja", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
22633: openat64(AT_FDCWD, "/tmp/sanitizer_common.reportfile.tmp.rzVEja.22633", O_WRONLY|O_CREAT|O_TRUNC, 0660) = 4
22633: unlinkat(AT_FDCWD, "/tmp/sanitizer_common.reportfile.tmp.rzVEja", 0) = 0
```
The first temp file, created by `temp_file_name`, is removed at the end
of the test, the second one, created in `ReportFile::GetReportPath`
using `OpenFile`, is not.
This patch fixes this, simply removing the file.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
This was originally done for testing purposes, but after #126315 we now
do testing through GitHub Actions and should be instead using the
optimization setting chosen by the user.
Close https://github.com/llvm/llvm-project/issues/126373
Although the root problems should be we shouldn't place the friend
declaration to the incorrect module, let's avoid bleeding the edge by
stoping diagnosing declarations not in file scope.
* When building on Windows hosts, PDBs aren't installed to the
`CMAKE_INSTALL_PREFIX`.
* This PR addresses it solely for `llvm-*` executables.
* Similar changes are required in `AddClang.cmake`, `AddLLD.cmake`, etc.
I'd be happy to queue those PRs after this one.
Updates layout struct to be `struct` with public fields instead of
`class` with private fields and chang the name prefix to `__cblayout_`.
Also adds Packed attribute to prevent struct padding and filters
additional types from the layout that were missed previously (arrays of
resources and groupshared declarations).
We've been improving these the tests for vector quite a bit and we are
probably not done improving our container tests. Formatting everything
at once will make subsequent reviews easier.
This PR add diagnostics for 3-parameter `std::basic_string(const char*
t, size_type pos, size_type count)` constructor in
bugprone-string-constructor check:
```cpp
std::string r1("test", 1, 0); // constructor creating an empty string
std::string r2("test", 0, -4); // negative value used as length parameter
// more examples in test file
```
Fixes false-positives reported in https://github.com/llvm/llvm-project/issues/123198.
The llvm.readcyclecounter intrinsic can be implemented via the `rdhwr
$3, $hwr_cc` instruction.
$hwr_cc: High-resolution cycle counter. This register provides read
access to the coprocessor 0 Count Register.
Fix#106318.
Currently `std::hash<Emplaceable>::operator()` relies implicit
conversion from `int` to `size_t`, which makes MSVC compelling. This PR
switches to use `static_cast`.
In `flat.map/flat.map.access/at_transparent.pass.cpp`, there's one
value-discarding use of `at` that wasn't marked `TEST_IGNORE_NODISCARD`.
This PR adds the missing `TEST_IGNORE_NODISCARD`.
One constructor was missing to propagate fast-math flags
from an operation to the builder. It is fixed now.
And the builder creation in one opt-bufferization case should take
the rewriter, I think.
A section of ObjectFileMachO is ifdef compiled only when
building to run on iOS etc natively, so this old method
call rename wasn't detected by normal on-mac building.
A DriverKit process is a kernel extension that runs in userland, instead
of running in the kernel address space/priv levels, they've been around
a couple of years. From lldb's perspective a DriverKit process is no
different from any other userland level process, but it has a different
Triple so we need to handle those cases in the lldb codebase. Some of
the DriverKit triple handling had been upstreamed to llvm-project, but I
noticed a few cases that had not yet. Cleaning that up.
Add a test for the `term-width` and `term-height` settings. I thought I
was hitting bug because in my statusline test I was getting the default
values when running under PExpect. It turned out hat the issue is that
we clear the settings at the start of the test. The Editline tests
aren't affected by this because Editline provides its own functions to
get the terminal dimensions and explicitly does not rely on LLDB's
settings (presumably exactly because of this behavior).
A new callback was added with the type
CommandReturnObjectCallbackResult, this commit namespaces that type to
match the format of other callback functions that have a non-primitive
return type in the lldb namespace.
rdar://144553496