Following the conclusion of the
[RFC](https://discourse.llvm.org/t/rfc-names-for-flang-rt-libraries/84321),
rename Flang's runtime libraries as follows:
* libFortranRuntime.(a|so) to libflang_rt.runtime.(a|so)
* libFortranFloat128Math.a to libflang_rt.quadmath.a
* libCufRuntime_cuda_${CUDAToolkit_VERSION_MAJOR}.(a|so) to
libflang_rt.cuda_${CUDAToolkit_VERSION_MAJOR}.(a|so)
This follows the same naming scheme as Compiler-RT libraries
(`libclang_rt.${component}.(a|so)`). It provides some consistency
between Flang's runtime libraries for current and potential future
library components.
The TransactionAcceptOrRevert pass is the final pass in the Sandbox
Vectorizer's default pass pipeline. It's job is to check the cost
before/after vectorization and accept or revert the IR to its original
state.
Since we are now starting the transaction in BottomUpVec, tests that run
a custom pipeline need to accept the transaction. This is done with the
help of the TransactionAlwaysAccept pass (tr-accept).
PR #126091 adds intrinsics for tcgen05
wait/fence/commit operations. This patch
adds NVVM Dialect Ops for them.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Run recipe simplification and dead recipe removal after VPlan-based
unrolling and optimizeForVFAndUF, to clean up any redundant or dead
recipes introduced by them. Currently this is NFC, as it removes the
corresponding removeDeadRecipes run in optimizeForVFAndUF and no
additional simplifications kick in after unrolling yet. That is changing
with https://github.com/llvm/llvm-project/pull/123655.
Note that with this change, pattern-matching is now applied after
EVL-based recipes have been introduced.
Trying to match VPWidenEVLRecipe when not explicitly requested might
apply a pattern with 2 operands to one with 3 due to the extra EVL
operand and VPWidenEVLRecipe being a subclass of VPWidenRecipe.
To prevent this, update Recipe_match::match to only match
VPWidenEVLRecipe if it is in the requested recipe types (RecipeTy).
PR: https://github.com/llvm/llvm-project/pull/125926
Otherwise, the handler "swallows" the signal and the process continues
to execute. While this use case is peculiar, ignoring these signals
entirely seems more odd.
Fix printing the correct file path in the error message when the output
file specified by `--dump-section` cannot be opened
Fixes: #125113 on ELF, MachO, Wasm
Consistently use hasScalarVFOnly instead of using
hasVF(ElementCount::getFixed(1)). Also add an assert to ensure all cases
are covered by hasScalarVFOnly.
This patch consumes the `DW_AT_APPLE_enum_kind` attribute added in
https://github.com/llvm/llvm-project/pull/124752 and turns it into a
Clang attribute in the AST. This will currently be used by the Swift
language plugin when it creates `EnumDecl`s from debug-info and passes
it to Swift compiler, which expects these attributes
getInput1Zp() returns an unsigned value which means in case of negative
zero point value the max intermediate value computation currently goes
wrong. Use getInput1ZpAttr() instead which returns an APInt and allows
easy sign extension to int64_t.
In 377257f063c, elements of structured bindings are copy-initialized.
They should be direct-initialized because the form of the initializer of
the whole structured bindings is a direct-list-initialization.
> [dcl.struct.bind]/1:
> ... and each element is copy-initialized or direct-initialized from
the corresponding element of the assignment-expression as specified by
the form of the initializer. ...
For example,
```cpp
int arr[2]{};
// elements of `[a, b]` should be direct-initialized
auto [a, b]{arr};
```
The use of Cpp11BracedListStyle with BinPackArguments=False avoids bin
packing until reaching a hard-coded limit of 20 items. This is an
arbitrary choice. Introduce a new style option to allow disabling this
limit.
Fix private memref creation bug in affine fusion exposed in the case of
the same memref being loaded from/stored to in producer nest. Make the
private memref replacement sound.
Change affine fusion debug string to affine-fusion - more compact.
Fixes: https://github.com/llvm/llvm-project/issues/48703
The maximum number of load/store watchpoints and fetch instruction
watchpoints is 14 each according to LoongArch Reference Manual [1],
so extend the maximum number of watchpoints from 8 to 14 for ptrace.
A new struct user_watch_state_v2 was added into uapi in the related
kernel commit 531936dee53e ("LoongArch: Extend the maximum number of
watchpoints") [2], but there may be no struct user_watch_state_v2 in
the system header in time.
In order to avoid undefined or redefined error, just add a new struct
loongarch_user_watch_state in LLDB which is same with the uapi struct
user_watch_state_v2, then replace the current user_watch_state with
loongarch_user_watch_state.
As far as I can tell, the only users for this struct in the userspace
are GDB and LLDB, there are no any problems of software compatibility
between the application and kernel according to the analysis.
The compatibility problem has been considered while developing and
testing. When the applications in the userspace get watchpoint state,
the length will be specified which is no bigger than the sizeof struct
user_watch_state or user_watch_state_v2, the actual length is assigned
as the minimal value of the application and kernel in the generic code
of ptrace:
```
kernel/ptrace.c: ptrace_regset():
kiov->iov_len = min(kiov->iov_len,
(__kernel_size_t) (regset->n * regset->size));
if (req == PTRACE_GETREGSET)
return copy_regset_to_user(task, view, regset_no, 0,
kiov->iov_len, kiov->iov_base);
else
return copy_regset_from_user(task, view, regset_no, 0,
kiov->iov_len, kiov->iov_base);
```
For example, there are four kind of combinations, all of them work well.
(1) "older kernel + older app", the actual length is 8+(8+8+4+4)*8=200;
(2) "newer kernel + newer app", the actual length is 8+(8+8+4+4)*14=344;
(3) "older kernel + newer app", the actual length is 8+(8+8+4+4)*8=200;
(4) "newer kernel + older app", the actual length is 8+(8+8+4+4)*8=200.
[1]
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#control-and-status-registers-related-to-watchpoints
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=531936dee53e
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Similar to D90898 (Linux AArch64), D124765 (SystemZ), and D148499
(RISCV).
In this commit, I enabled two test cases, while zhuqizheng supported
with the source code development.
Co-Authored-By: zhuqizheng <zhuqizheng@loongson.cn>
Co-authored-by: zhuqizheng <zhuqizheng@loongson.cn>
`MinSequenceContainer::size` can be narrowing on 64-bit platforms, and
MSVC complains about such implicit conversion. This PR changes the
implicit conversion to explicit `static_cast`.
`min_allocator::allocate` and `min_allocator::deallocate` have
`ptrdiff_t` as the parameter type, which seems weird, because the
underlying `std::allocator`'s member functions take `size_t`. It seems
better to use `size_t` consistently.
This line was previously removed when
12d47247e5046b959af180e12f648c54e2c5e863 moved it to RISCVInstrInfo.h.
But we probably don't want to have dangling `#define *_DECL`
(RISCVGenSearchableTables.inc will `#undef` these macros) and I think
there is no harm putting declarations of those search table functions in
RISCVISelDAGToDAG.h.
Introduce the CMake switch FLANG_INCLUDE_RUNTIME. When set to off, do
not add build instructions for the runtime.
This is required for Flang-RT (#110217) and the current runtime CMake
code to co-exist. When using `LLVM_ENABLE_RUNTIME=flang-rt`, the in-tree
build instructions are in conflict and must be disabled.
This can occur due to linker ICF and stable sort will ensure the results
are stable.
No explicit/new test coverage, because nondeterminism is non-testable.
It should already be covered by the DWARFDebugLineTest that was failing
some internal testing on an ARM machine which might've been what changed
the sort order. But `llvm::sort` also deliberately randomizes the
contents (under EXPENSIVE_CHECKS) so I'd have expected failures to show
up in any EXPENSIVE_CHECKS Build...
Updated status to 'done' for OpenMP 6.0 features:
- OpenMP directives in concurrent loop regions
- atomics constructs on concurrent loop regions
- Lift nesting restriction on concurrent loop
Removed duplicate OpenMP 6.0 feature per Michael Klemm:
- atomic constructs in loop region
This PR adds:
- `RootSignatureFlags` extraction from DXContainer using `obj2yaml`
This PR is part of: #121493
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
Summary:
Recent changes made a lot of this stuff redundant or unused, clean it up
a bit.
Also snuck in a change to pass the CUDA path since we still use it for
`fatbinary` internally.
Add support for single element vector{load|store} lowering to SPIR-V.
Since, SPIR-V converts single element vector to scalars, it needs
special attention for vector{load|store} lowering to spirv{load|store}.