Use `mlir_target_link_libraries()` to link dependencies of libraries
that are not included in libMLIR, to ensure that they link to the dylib
when they are used in Flang. Otherwise, they implicitly pull in all
their static dependencies, effectively causing Flang binaries to
simultaneously link to the dylib and to static libraries, which is never
a good idea.
I have only covered the libraries that are used by Flang. If you wish, I
can extend this approach to all non-libMLIR libraries in MLIR, making
MLIR itself also link to the dylib consistently.
Before this change InstrSet in SPIRVEmitIntrinsics was uninitialized
before running runOnFunction. This change adds a new function
getPreferredInstructionSet in SPIRVSubtarget.
Currently we're using quite different internal names for the
`std::invoke` family of type traits. This adds a layer around the
current implementation to make it easier to understand when it is used
and makes it easier to define multiple implementations of it.
In gcc-15, explicit includes of `<cstdint>` are required when fixed-size
integers are used. In this file, this include only happened as a side
effect of including SmallVector.h
Although llvm compiles fine, the root-project would benefit from
explicitly including it here, so we can backport the patch.
Maybe interesting for @hahnjo and @vgvassilev
These instructions return a 64-bit result and a 1-bit carry, unlike
smul_lohi and umul_lohi which return a pair of 32-bit results.
This does not appear to make any difference in practice because the DAG
types are not used for anything before these nodes are converted to
MachineInstrs.
All targets build `__clc_mad` -- even SPIR-V targets -- since it
compiles to the optimal `llvm.fmuladd` intrinsic. There is no change to
the bytecode generated for non-SPIR-V targets.
The `mix` builtin, which is implemented as a wrapper around `mad`, is
left as an OpenCL-layer wrapper of `__clc_mad`. I don't know if it's
worth having a specific CLC version of `mix`.
The changes to the other CLC files/functions are moving uses of `mad` to
`__clc_mad`, and reformatting. There is an additional instance of
`trunc` becoming `__clc_trunc`, which was missed before.
The test was failing in the case where a `multilib.yaml` file was
present in the installation. This is because the presence of a multilib
YAML file leads to the diagnosing of validity of the multilib custom
flags.
This patch fixes the test by creating a new YAML file with multilib
custom flags to be used by the test.
This PR introduces the following transformations:
- If C0 is not 0:
umax(nuw_shl(x, C0), x + 1) -> x == 0 ? 1 : nuw_shl(x, C0)
- If C0 is not 0 or 1:
umax(nuw_mul(x, C0), x + 1) -> x == 0 ? 1 : nuw_mul(x, C0)
Fixes#122388.
Alive2 proof: https://alive2.llvm.org/ce/z/rkp_8U
For similar reasons for fixed-width being prefered to scalable for
Neoverse V2, this patch enables the UseFixedOverScalableIfEqualCost
feature when using -mcpu=cortex-x2, x3, x4 and x925 that are similar to
Neoverse V2.
This is a follow-up for https://github.com/llvm/llvm-project/pull/119110
and a fix for https://github.com/llvm/llvm-project/issues/118450
RemoveDeadValues used to delete Values and analyzing the IR at the same
time, because of that, `isMemoryEffectFree` got invalid IR with
half-deleted linalg.generic operation. This PR separates analysis and
cleanup to prevent such situation.
Thank you!
---------
Co-authored-by: Renat Idrisov <parsifal-47@users.noreply.github.com>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
The indices of SGPR register pairs need to be 2-aligned and SGPR
quadruplets need to be 4-aligned. With this patch, we report an error
when inline asm register constraints specify a misaligned register
index, instead of silently dropping the specified index.
Fixes#123208
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
For the following variable
DenseMap<const Instruction *, std::pair<PartialReductionChain,
unsigned>>
ScaledReductionExitInstrs;
we never actually need the PartialReductionChain when using the map.
I've cleaned this up so that this now becomes
DenseMap<const Instruction *, unsigned> ScaledReductionMap;
Summary:
This is spelled `ompx_aligned_barrier` when used directly, but wasn't
included in the list of known assumptions. Fix that so now th test
works.
1. Remove `%c0 = arith.constant 0 : index` from testt functions. This
extra Op is not needed (the index can be passed as an argument), so
this is just noise.
2. Replaced `%cst_0` with `%pad` to communicate what the underlying SSA
value is intended for.
3. Unified some comments.
After a30e50fc, AMDGPUAAResult is being called in more situations where
BasicAA isn't sure. This exposed some regressions where NoAlias is being
incorrectly returned for two identical pointers.
The fix is to check the underlying objects for equality before returning
NoAlias.
Summary:
We used this globally scoped `ext_no_call_asm` as a sort of hack around
the compiler that allowed the attributor to optimize out inline assembly
calls to PTX instructions. Quite some time ago I got rid of every inline
assembly call and replaced it with a builitin, so this can just be
deleted.
Furthermore, I use the `[[omp::assume]]` attribute directly for the
aligned barrier usage. This prints an unknown assumption warning (even
though it isn't) so I'm just silencing that for now until I fix it
later.
---------
Co-authored-by: Michael Kruse <github@meinersbur.de>
The function getPartialReductionCost is already quite large and
is likely to grow in size as we add support for more cases in
future. Therefore, I think it's best to move this into the cpp
file.
In https://reviews.llvm.org/D136765 / https://reviews.llvm.org/D144155,
the asan annotations for `std::vector` were modified to unpoison freed
backing memory on destruction, instead of leaving it poisoned. However,
calling `__clear()` instead of `clear()` skips informing the asan runtime
of this decrease in the accessible container size, which breaks the
invariant that the value of `old_mid` should match the value of `new_mid`
from the previous call to `__sanitizer_annotate_contiguous_container`, which
can trip the sanity checks for the partial poison between [d1, d2) and the
container redzone between [d2, c), if enabled. To fix this, ensure that
`clear()` is called instead, as is already done by `__vdeallocate()`.
Also remove `__clear()`, since it is no longer called.
We are not handling 'S' scalar dependencies correctly and have at least
the following miscompiles related to that:
[LoopInterchange] incorrect handling of scalar dependencies and dependence vectors starting with ">" #54176
[LoopInterchange] Interchange breaks program correctness #46867
[LoopInterchange] Loops should not interchanged due to dependencies #47259
[LoopInterchange] Loops should not interchanged due to control flow #47401
This patch does no longer insert the "S" dependency/direction into the
dependency matrix, so a dependency is never "S". We seem to have
forgotten what the exact meaning is of this dependency type, and don't
see why it should be treated differently.
We prefer correctness over incorrect and more aggressive results. I.e.,
this prevents the miscompiles at the expense of handling less cases,
i.e. making interchange more pessimistic. However, some of the cases
that are now rejected for dependence analysis reasons, were rejected
before too but for other reasons (e.g. profitability). So at least for
the llvm regression tests, the number of regression are very reasonable.
This should be a stopgap. We would like to get interchange enabled by
default and thus prefer correctness over unsafe transforms, and later
see if we can get solve the regressions.
Based on feedback from the clastb codegen PR, I'm refactoring basic codegen for the vector.extract.last.active intrinsic to lower to an ISD node in SelectionDAGBuilder then expand in LegalizeVectorOps, instead of doing everything in the builder.
The new ISD node (vector_find_last_active) only covers finding the index of the last active element of the mask, and extracting the element + handling passthru is left to existing ISD nodes.
This PR allows to lower **unsigned** `tosa.max_pool2d` to linalg.
```
// CHECK-LABEL: @max_pool_ui8
func.func @max_pool_ui8(%arg0: tensor<1x6x34x62xui8>) -> tensor<1x4x32x62xui8> {
// CHECK: builtin.unrealized_conversion_cast {{.*}} : tensor<1x6x34x62xui8> to tensor<1x6x34x62xi8>
// CHECK: arith.constant 0
// CHECK: linalg.pooling_nhwc_max_unsigned {{.*}} : (tensor<1x4x32x62xi8>) -> tensor<1x4x32x62xi8>
// CHECK: builtin.unrealized_conversion_cast {{.*}} : tensor<1x4x32x62xi8> to tensor<1x4x32x62xui8>
%0 = tosa.max_pool2d %arg0 {pad = array<i64: 0, 0, 0, 0>, kernel = array<i64: 3, 3>, stride = array<i64: 1, 1>} : (tensor<1x6x34x62xui8>) -> tensor<1x4x32x62xui8>
return %0 : tensor<1x4x32x62xui8>
}
```
It does this by
- converting the MaxPool2dConverter from OpRewriterPattern to
OpConversion Pattern
- adjusting the padding value to the the minimum unsigned value when the
max_pool is unsigned
- lowering to `linalg.pooling_nhwc_max_unsigned` (which uses
`arith.maxui`) when the max_pool is unsigned
When `RecordType` is converted to corresponding `DIType`, we cache the
information to avoid doing the conversion again.
Our conversion of `RecordType` looks like this:
`ConvertRecordType(RecordType Ty)`
1. If type `Ty` is already in the cache, then return the corresponding
item.
2. Create a place holder `DICompositeTypeAttr` (called `ty_self` below)
for `Ty`
3. Put `Ty->ty_self` in the cache
4. Convert members of `Ty`. This may cause `ConvertRecordType` to be
called again with other types.
5. Create final `DICompositeTypeAttr`
6. Replace the `ty_self` in the cache with one created in step 5 end
The purpose of creating `ty_self` is to handle cases where a member may
have reference to parent type.
Now consider the code below:
```
type t1
type(t2), pointer :: p1
end type
type t2
type(t1), pointer :: p2
end type
```
While processing t1, we could have a structure like below. `t1 -> t2 ->
t1_self`
The `t2` created during handling of `t1` cant be cached on its own as it
contains a place holder reference. It will fail an assert in MLIR if it
is processed standalone. To avoid this problem, we have a check in the
step 6 above to not cache such types. But this check was not tight
enough. It just checked if a type should not have a place holder
reference to another type. It missed the following case where the place
holder reference can be in a type further down the line.
```
type t1
type(t2), pointer :: p1
end type
type t2
type(t3), pointer :: p2
end type
type t3
type(t1), pointer :: p3
end type
```
So while processing `t1`, we have to stop caching of not only `t3` but
also of `t2`. This PR improves the check and moves the logic inside
`convertRecordType`.
Please note that this limitation of why a type cant have a placeholder
reference is because of how such references are resolved in the mlir.
Please see the discussion at the end of this
[PR](https://github.com/llvm/llvm-project/pull/106571).
I have to change `getDerivedType` so that it will also get the derived
type for things like `type(t2), pointer :: p1` which are wrapped in
`BoxType`. Happy to move it to a new function or a local helper in case
this change is problematic.
Fixes#122024.
This should not effect the result, unless the getArithmeticInstrCost and
getVectorInstrCost routines learn to produce different costs (with CostKind =
CodeSize for example). The -1 lanes prevent 0 lanes from (incorrectly) being
marked as free.
If Polly is built with LLVM_POLLY_LINK_INTO_TOOLS=ON (the default for
monorepo builds), then Polly will become a dependency of the
LLVMExtensions component, which is part of LLVMExports. As such, all the
Polly libraries also have to be part of LLVMExports.
However, if Polly is built with LLVM_POLLY_LINK_INTO_TOOLS=OFF, we also
end up adding Polly libraries to LLVMExports. This is undesirable, as it
adds a hard dependency from llvm on polly.
Fix this by only exporting polly libraries from LLVMExports if
LLVM_POLLY_LINK_INTO_TOOLS is enabled.
Adding SPIRV to LLVM_ALL_TARGETS
(https://github.com/llvm/llvm-project/pull/119653) revealed a series of
minor compilation problems and sanitizer complaints. This PR is to
address the problem.