When determining the install prefix in LLVMConfig.cmake etc resolve
symlinks in CMAKE_CURRENT_LIST_FILE first. The motivation for this is to
support symlinks like `/usr/lib64/cmake/llvm` to
`/usr/lib64/llvm19/lib/cmake/llvm`. This only works correctly if the
paths are relative to the resolved symlink.
It's worth noting that this *mostly* already works out of the box,
because cmake automatically does the symlink resolution when the library
is found via CMAKE_PREFIX_PATH. It just doesn't happen when it's found
via the default prefix path.
This commit adds the following Dense Math Facility integer calculation
instructions: dmxvi8gerx4, dmxvi8gerx4pp, dmxvi8gerx4spp, pmdmxvi8gerx4,
pmdmxvi8gerx4pp, and pmdmxvi8gerx4spp, along with their corresponding
intrinsics and tests.
ELFExtendedAttrParser lacked a destructor that properly handled errors,
causing `llvm-readobj --arch-specific` to crash when the AArch64 Build
Attributes section was empty.
This commit adds error handling in the destructor and introduces test
files for `--arch-specific` to cover both an empty AArch64 Build
Attributes section and a populated one.
Fixes:
b1ebfac185
A few test files seemed to have been edited after using the
update_test_checks.py script, which can make life hard for
developers when trying to update these tests in future
patches. Also, the tests still had this comment at the top
; NOTE: Assertions have been autogenerated by ...
which could potentially be confusing, since they've not
strictly been auto-generated.
I've attempted to keep the spirit of the original tests by
excluding all CHECK lines after the scalar.ph IR block,
however I've done this by using a new option called
--filter-out-after to the update_test_checks.py script.
We already handle the X86ISD::VPERMV3 node type, but if we can handle equivalent cases before intrinsic lowering we can simplify the code further - e.g. #109272 before constant BUILD_VECTOR nodes gets lowered to constant pool loads.
The language reference says about inbounds geps that "if the
getelementptr has any non-zero indices[...] [t]he base pointer has an in
bounds address of the allocated object that it is based on [and]
[d]uring the successive addition of offsets to the address, the
resulting pointer must remain in bounds of the allocated object at each
step."
If (gep inbounds p, (a + 5)) is translated to (gep [inbounds] (gep p,
a), 5) with p pointing to the beginning of an object and a=-4, as the
example in the comments suggests, that's the case for neither of the
resulting geps. Therefore, we need to clear the inbounds flag for both
geps.
We might want to use ValueTracking to check if a is known to be
non-negative to preserve the inbounds flags.
For the AMDGPU tests with scratch instructions, removing the unsound
inbounds flag means that AMDGPUDAGToDAGISel::isFlatScratchBaseLegal sees
no NUW flag at the pointer add, which prevents generation of scratch
instructions with immediate offsets.
For SWDEV-516125.
WG14 N3353 added support for 0o and 0O as octal literal prefixes. It
also deprecates use of octal literals without a prefix, except for the
literal 0.
This feature is being exposed as an extension in older C language modes
as well as in all C++ language modes.
This represents a hardware mode supported only for wave32 compute
shaders. When enabled, we set the `.dynamic_vgpr_en` field of
`.compute_registers` to true in the PAL metadata.
This will be changed to use an attribute after downstream consumers
have been migrated.
Reapply "[analyzer] Delay the checker constructions after parsing"
(#128350)
This reverts commit db836edf47f36ed04cab919a7a2c4414f4d0d7e6, as-is.
Depends on #128368
Previously the value created to represent the uninitialized memory
of the alloca was undef. Use freeze poison instead. Enables some
optimization improvements (which need defeating in the limit tests),
but also a few regressions. Seems to leave behind dead code in some
cases too.
These tests demonstrate the issue in SSAUpdaterBulk when it calculates
incoming values from loop back edges.
The failures are marked with `EXPECT_NONFATAL_FAILURE`, which is the way
to designate an "expected fail" in the Google Test suite.
Adds new MLIR ops to model `do concurrent`. In order to make `do
concurrent` representation self-contained, a loop is modeled using 2
ops, one wrapper and one that contains the actual body of the loop. For
example, a 2D `do concurrent` loop is modeled as follows:
```mlir
fir.do_concurrent {
%i = fir.alloca i32
%j = fir.alloca i32
fir.do_concurrent.loop
(%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
%0 = fir.convert %i_iv : (index) -> i32
fir.store %0 to %i : !fir.ref<i32>
%1 = fir.convert %j_iv : (index) -> i32
fir.store %1 to %j : !fir.ref<i32>
}
}
```
The `fir.do_concurrent` wrapper op encapsulates both the actual loop and
the allocations required for the iteration variables. The
`fir.do_concurrent.loop` op is a multi-dimensional op that contains the
loop control and body. See the ops' docs for more info.
This extension adds twelve conditional branch instructions that use an
immediate operand for the source.
The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/tag/Xqci-0.7.0
This patch adds assembler only support.
Co-authored-by: Sudharsan Veeravalli <quic_svs@quicinc.com>
Whilst trying to clean up some loop vectoriser IR tests (see
test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll
for example) a reviewer on PR #129047 suggested it would be
nice to have an option to stop generating CHECK lines after a
certain point. Typically when performing a transformation with
the loop vectoriser we don't usually care about any CHECK lines
generated for the scalar tail of the loop, since the scalar
loop is kept intact. Previously if you wanted to eliminate such
unwanted CHECK lines you had to run the update script, then
manually delete all the lines corresponding to the scalar loop.
This can be very time consuming if the tests ever need changing.
What I've tried to do here is add a new --filter-out-after
option alongside the existing --filter* options that provides
support for stopping the generation of any CHECK lines beyond
the line that matches the filter. With the existing filter
options we never generate CHECK-NEXT lines, but we still care
about ordering with --filter-out-after so I've amended the
code to ensure we treat this filter differently.
The tests updated by this commit were designed to check features in the
clang's driver and index that require clang to be targgeting a darwin
platform while running on a darwin host. For that, their execution is
currently gated by the `REQUIRES: system-darwin` annotation.
This approach becomes a problem when trying to run such tests on a
cross-compiling build of clang on a darwin platform. When the default
target is not darwin (e.g. via `LLVM_DEFAULT_TARGET_TRIPLE `), the
tests will still run on a darwin host and fail spuriously because of the
mismatch with the target detection.
To fix this issue, this patch introduces an extra condition to the
tests' REQUIRES annotation, `target={{.*}}-{{darwin|macos}}{{.*}}`,
ensuring they only run when the relevant target is present.
From #106446, this adds a variant of getVectorIdxTy that returns an LLT.
Many uses only look at the width, so a getVectorIdxWidth was added as
the common base.
Add runtime verification for `memref.dim`: check that the index is in
bounds.
Also simplify the pass pipeline for all memref runtime verification
checks.
Why? This option can lead to incorrect IR if used in isolation, for
example, consider the IR below:
```mlir
func.func @loop_with_aliasing(%arg0: tensor<5xf32>, %arg1: index, %arg2: index) -> tensor<5xf32> {
%c1 = arith.constant 1 : index
%cst = arith.constant 1.000000e+00 : f32
%0 = tensor.empty() : tensor<5xf32>
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<5xf32>) -> tensor<5xf32>
// The BufferizableOpInterface says that %2 alias with %arg0 or be a newly
// allocated buffer
%2 = scf.for %arg3 = %arg1 to %arg2 step %c1 iter_args(%arg4 = %arg0) -> (tensor<5xf32>) {
scf.yield %1 : tensor<5xf32>
}
%cst_0 = arith.constant 1.000000e+00 : f32
%inserted = tensor.insert %cst_0 into %1[%c1] : tensor<5xf32>
return %2 : tensor<5xf32>
}
```
If we bufferize with: enforce-aliasing-invariants=false, we get:
```
func.func @loop_with_aliasing(%arg0: memref<5xf32, strided<[?], offset: ?>>, %arg1: index, %arg2: index) -> memref<5xf32, strided<[?], offset: ?>> {
%c1 = arith.constant 1 : index
%cst = arith.constant 1.000000e+00 : f32
%alloc = memref.alloc() {alignment = 64 : i64} : memref<5xf32>
linalg.fill ins(%cst : f32) outs(%alloc : memref<5xf32>)
%0 = scf.for %arg3 = %arg1 to %arg2 step %c1 iter_args(%arg4 = %arg0) -> (memref<5xf32, strided<[?], offset: ?>>) {
%cast = memref.cast %alloc : memref<5xf32> to memref<5xf32, strided<[?], offset: ?>>
scf.yield %cast : memref<5xf32, strided<[?], offset: ?>>
}
%cst_0 = arith.constant 1.000000e+00 : f32
memref.store %cst_0, %alloc[%c1] : memref<5xf32>
return %0 : memref<5xf32, strided<[?], offset: ?>>
}
```
Which is not correct IR since the loop yields the allocation.
I am using this option. What do I need to do now?
If you are using this option in isolation, you are possibly generating
incorrect IR, so you need to revisit your bufferization strategy. If you
are using it together with `copyBeforeWrite,` you simply need to retire
the `enforceAliasingInvariants` option.
Co-authored-by: Matthias Springer <mspringer@nvidia.com>
This allows specializing the implementation for different targets
without including unnecessary logic and is similar to #111559 which did
the same for printf Writer interface.
…os_log functions should be treated as safe in call arguments checkers.
Also treat __builtin_* functions and __libcpp_verbose_abort functions as
"trivial" for the purpose in call argument checkers.
This patch replaces use of MachineRegisterInfo's liveIn check with the
machine basicBlock's liveIn. As the MRI's liveIn is inconsistent with
the entry MBB liveIns, when it comes to the machine verifier checks.
PS: Its an alternative solution with respect to #126926.