The `reassoc` fast-math flag allows a much wider array of algebraic
transformations than just strictly reassociations. In some cases it does
commutations, distributions, and folds away redundant inverse
operations...
While it might make sense to fix the flag naming at some point, in the
meantime we should at least have the docs be accurate to avoid
confusion.
We currently do not have masked vectorization support for tenor.pad with
low padding. However, we can allow this in the special case where the
result dimension after padding is a unit dim. The reason is when we
actually have a low pad on a unit dim, the input size of that dimension
will be (or should be for correct IR) dynamically zero and hence we will
create a zero mask which is correct. If the low pad is dynamically zero
then the lowering is correct as well.
---------
Signed-off-by: Nirvedh <nirvedh@gmail.com>
Update both VPInterleaveRecipe and VPReplicateRecipe codegen to use
debug location directly from the recipe, not the underlying instruction.
This removes another dependency on underlying instructions.
e2ba1b6ffde4ec607342b1b746d1b57f0f04390a references that it reverts a
commit that's not a parent of e2ba1b6ffde4ec607342b1b746d1b57f0f04390a.
Functionally, this can (and demonstrably does) work(*), but from the
standpoint of the revert checker, it's nonsense. Print a `logging.error`
when it's detected.
Tested by running the revert checker against a commit range that
includes the aforementioned commit; the logging.error was fired
appropriately.
(*) - the specifics here are:
- the _SHA_ that was referenced was on a non-main branch, but
- the commit from the non-main branch was merged into the non-main
branch from main
- ...so the _functional_ commit being reverted was originally landed on
main, but the _SHA_ referenced from main was from a branch that was cut
before the reverted-commit was landed on main
Fixes#133365
## Changes Done
- Changed the signed checking to
```cpp
struct is_signed : bool_constant<((is_fixed_point<T> || is_arithmetic_v<T>) && (T(-1) < T(0)))>
```
in ``/libc/src/__support/CPP/type_traits/is_signed.h``. Added check for
fixed-points.
- But, got to know that this will fail for ``unsigned _Fract`` or any
unsigned fixed-point because ``unsigned _Fract`` can’t represent -1 in
T(-1), while ``unsigned int`` can handle it via wrapping.
- That's why I explicity added ``is_signed`` check for ``unsigned``
fixed-points.
- Same changes to ``/libc/src/__support/CPP/type_traits/is_unsigned.h``.
- Added tests for ``is_signed`` and ``is_unsigned``.
In order to facilitate targets that only support masked loads/stores
on certain address spaces (AMDGPU will support them in an upcoming
patch, but only for address space 7), add an AddressSpace parameter
to isLegalMaskedLoad and isLegalMaskedStore
OpenACC Github PR#499 defines the pqr-list as having at least 1 item. We
already handle that for all but 'wait', so this patch just does the work
to add it for 'wait', plus adds tests.
This patch corrects an invalid condition in `getEffectsOnResource` used
to identify relevant "resources":
```cpp
return it.getResource() != resource;
```
The current implementation assumes that only one instance of each
resource will exist, so comparing raw pointers is both safe and
sufficient. This assumption stems from constructs like:
```cpp
static DerivedResource *get() {
static DerivedResource instance;
return &instance;
}
```
i.e., resource instances returned via static singleton methods.
However, as discussed in
* https://github.com/llvm/llvm-project/issues/129216,
this assumption breaks in practice — notably on macOS (Apple Silicon)
when built with:
* `-DBUILD_SHARED_LIBS=On`.
In such cases, multiple instances of the same logical resource may exist
across shared library boundaries, leading to incorrect behavior and
causing failures in tests like:
* test/Dialect/Transform/check-use-after-free.mlir
This patch replaces the pointer comparison with a comparison based on
resource identity:
```cpp
return it.getResource()->getResourceID() != resource->getResourceID();
```
This approach aligns better with the intent of `getEffectsOnResource`,
which is to:
```cpp
/// Collect all of the effect instances that operate on the provided
/// resource (...)
```
Fixes#129216
#118806 fixed an infinite loop in FoldShiftByConstant that could occur
when the shift amount was a ConstantExpr.
However this meant that FoldShiftByConstant no longer kicked in for
scalable vectors because scalable splats are represented by
ConstantExprs.
This fixes it by allowing scalable splats of non-ConstantExprs in
m_ImmConstant, which also fixes a few other test cases where scalable
splats were being missed.
But I'm also hoping that UseConstantIntForScalableSplat will eventually
remove the need for this.
I noticed this when trying to reverse a combine on RISC-V in #132245,
and saw that the resulting vector and scalar forms were different.
---------
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
Consider:
```
function foo()
!$omp declare target(foo) ! This `foo` was a function-result symbol
...
end
```
When resolving symbols, for this case use the symbol corresponding to
the function instead of the symbol corresponding to the function result.
Currently, this will result in an error:
```
error: A variable that appears in a DECLARE TARGET directive must be
declared in the scope of a module or have the SAVE attribute, either
explicitly or implicitly
```
This commit adds support for enabling and disabling plugins by name. The
changes are made generically in the `PluginInstances` class, but
currently we only expose the ability to SystemRuntime plugins. Other
plugins types can be added easily.
We had a few design goals for how disabled plugins should work
1. Plugins that are disabled should still be visible to the system. This
allows us to dynamically enable and disable plugins and report their
state to the user.
2. Plugin order should be stable across disable and enable changes. We
want avoid changing the order of plugin lookup. When a plugin is
re-enabled it should return to its original slot in the creation order.
3. Disabled plugins should not appear in PluginManager operations.
Clients should be able to assume that only enabled plugins will be
returned from the PluginManager.
For the implementation we modify the plugin instance to maintain a bool
of its enabled state. Existing clients external to the Instances class
expect to iterate over only enabled instance so we skip over disabed
instances in the query and snapshot apis. This way the client does not
have to manually check which instances are enabled.
This allows us to remove the need for `_LIBCPP_TEMPLATE_VIS` and fixes a
bunch of missing annotations for RTTI when used across dylib boundaries.
`_LIBCPP_TEMPLATE_VIS` itself will be removed in a separate patch, since
it touches a lot of code.
This patch is a no-op for Clang. Only GCC is affected.
We haven't implemented 16 bit SGPRs. Currently allow 32-bit SGPRs to be
folded into True16 bit instructions taking 16 bit values. Also use
sgpr_32 when Imm is copied to spgr_lo16 so it could be further folded.
This improves generated code quality.
Add a pattern that bubbles up tensor.extract_slice through
tensor.collapse_shape.
The pattern is registered in a pattern population function that is used
by the transform op
transform.apply_patterns.tensor.bubble_up_extract_slice and by the
tranform op transform.structured.fuse as a cleanup pattern.
This pattern enables tiling and fusing op chains which contain
tensor.collapse_shape if added as a cleanup pattern of tile and fuse
utility.
Without this pattern that would not be possible, as
tensor.collapse_shape does not implement the tiling interface. This is
an additional pattern to the one added in PR #126898
While trying to make progress on #133782, I noticed that
TestDAP_Progress was taking 90 seconds to complete. This patch brings
that down to 10 seocnds by making the following changes:
1. Don't call `wait_for_event` with a 15 second timeout. By the time we
call this, all progress events have been emitted, which means that we're
just sitting there until we hit the timeout.
2. Don't use 10 steps (= 10 seconds) for indeterminate progress. We have
two indeterminate progress tests so that's 6 seconds instead of 20.
3. Don't launch the process over and over. Once we have a dap session,
we can clear the progress vector and emit new progress events.
Currently iterators over EquivalenceClasses will iterate over std::set,
which guarantees the order specified by the comperator. Unfortunately in
many cases, EquivalenceClasses are used with pointers, so iterating over
std::set of pointers will not be deterministic across runs.
There are multiple places that explicitly try to sort the equivalence
classes before using them to try to get a deterministic order
(LowerTypeTests, SplitModule), but there are others that do not at the
moment and this can result at least in non-determinstic value naming in
Float2Int.
This patch updates EquivalenceClasses to keep track of all members via a
extra SmallVector and removes code from LowerTypeTests and SplitModule
to sort the classes before processing.
Overall it looks like compile-time slightly decreases in most cases, but
close to noise:
https://llvm-compile-time-tracker.com/compare.php?from=7d441d9892295a6eb8aaf481e1715f039f6f224f&to=b0c2ac67a88d3ef86987e2f82115ea0170675a17&stat=instructions
PR: https://github.com/llvm/llvm-project/pull/134075
Need to update the mapping between gathered values and their matching
entries, if the list of the entries is updated and only some of them are
selected for final shuffling.
Fixes#134085
This patch introduces the `vmem-to-lds-load-insts` target feature, which
can be used to enable builtins `__builtin_amdgcn_global_load_lds` and
`__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this
feature.
This feature is only available on gfx9/10.
A limitation of using a common target feature for both builtins is that
we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available
on gfx6,7,8.
Preserve branch weight metadata when merging instructions if one of the
instructions is missing metadata. This is similar in behaviour to what
we do today for other types of metadata such as mmra, memprof and
callsite metadata.
This was added in OpenACC PR #511 in the 3.4 branch. From an AST/Sema
perspective this is pretty trivial as the infrastructure for 'if'
already exists, however the atomic construct needed to be taught to take
clauses. This patch does that and adds some testing to do so.
Previously we only marked fixed length vector extracts as cheap, so this
extends it to any extract at index 0 which should just be a subreg
extract.
This allows extracts of i1 vectors to be considered for DAG combines,
but also scalable vectors too.
This causes some slight improvements with large legalized fixed-length
vectors, but the underlying motiviation for this is to actually prevent
an unprofitable DAG combine on a scalable vector in an upcoming patch.
Fixes#130510.
In RISCV, modify the folding of (X ^ Y == 0) -> (X == Y) to account for
cases where the (X ^ Y) will be re-used.
If a constant is being used for the XOR before a branch, ensure that it
is small enough to fit within a 12-bit immediate field. Otherwise, the
equality check is more efficient than the check against 0, see the
following:
```
# %bb.0:
lui a1, 5
addiw a1, a1, 1365
xor a0, a0, a1
beqz a0, .LBB0_2
# %bb.1:
ret
.LBB0_2:
```
```
# %bb.0:
lui a1, 5
addiw a1, a1, 1365
beq a0, a1, .LBB0_2
# %bb.1:
xor a0, a0, a1
ret
.LBB0_2:
```
Similarly, if the XOR is between 1 and a size one integer, we should
still fold away the XOR since that comparison can be optimized as a
comparison against 0.
```
# %bb.0:
slt a0, a0, a1
xor a0, a0, 1
beqz a0, .LBB0_2
# %bb.1:
ret
.LBB0_2:
```
```
# %bb.0:
slt a0, a0, a1
bnez a0, .LBB0_2
# %bb.1:
xor a0, a0, 1
ret
.LBB0_2:
```
One question about my code is that I used a hard-coded value for the
width of a RISCV ALU immediate. Do you know of a way that I can gather
this from the `context`, I was unable to devise one.
This makes it easier to reduce llvm-reduce with llvm-reduce to filter
cases where the input reduced too much.
Not sure if it's possible to test the exit code in lit.
These are the three remaining native builtins not yet ported.
There are elementwise versions of exp10 and tan which correspond to the
intrinsics, which may be preferable to the current versions which route
through other native builtins. Those could be changed in a follow-up if
desired.
This PR does the following:
1. Use SPIR-V backend to do LLVM to SPIR-V translation inside
clang-sycl-linker
2. Remove llvm-spirv translator from clang-sycl-linker Currently, no
SPIR-V extensions are enabled for SYCL compilation flow. This will be
updated in subsequent commits.
Thanks
Note: This is one of the many PRs being introduced to add SYCL
programming model support to LLVM
([RFC](https://discourse.llvm.org/t/rfc-add-sycl-programming-model-support/50812)).
---------
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
This fixes the current lowering of `arith.ceildivsi` in the arith-expand
pass, which was previously incorrect. The new version is based on the
lowering of `arith.floordivsi`, and will not introduce new undefined
behavior or poison during the lowering. It also replaces one division
with a multiplication.
The previous lowering of `ceildivsi(n, m)` was the following:
```
x = (m > 0) ? -1 : 1
(n*m>0) ? ((n+x) / m) + 1 : - (-n / m)
```
This caused two problems:
* In the case where `n` is INT_MIN and `m` is positive, the result would
be poison instead of an actual value
* In the case where `n` is INT_MAX and `m` is `-1`, this would trigger
undefined behavior, while the original code wouldn't. This is because
`n+x` would be equal to `INT_MIN` (`INT_MAX + 1`), so the `(n+x) / m`
division would overflow and trigger UB.
This allows NOCROSSREFS to be specified in OVERLAY linker script
descriptions. This is a particularly useful part of the OVERLAY syntax,
since it's very rarely possible for one overlay section to sensibly
reference another.
Closes#128790
According to [1], the template parameter must be cv-unqualified and one
of unsigned short, unsigned int, unsigned long, or unsigned long long.
Should fix the following MSVC error:
error: static assertion failed due to requirement
'_Is_any_of_v<unsigned char, unsigned short, unsigned int, unsigned
long, unsigned long long>': invalid template argument for
independent_bits_engine: N4659
[1] https://en.cppreference.com/w/cpp/numeric/random/independent_bits_engine
The test suite of LoopVectorize suffers from a coverage hole when types
mismatch, and runtime checks are needed, with a conflict redux. Fix this
coverage hole by adding tests.