See https://github.com/llvm/llvm-project/issues/135401 for the full flakiness report.
It fails on stage2/asan_ubsan check with:
```
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation
JIT session error: In graph incr_module_23-jitted-objectbuffer, section .text.startup: relocation target "_ZN1AD2Ev" at address 0x79618c48d040 is out of range of Delta32 fixup at 0x75618b40d02d (<anonymous block> @ 0x75618b40d010 + 0x1d)
error: Failed to materialize symbols: { (main, { $.incr_module_23.__inits.0, __orc_init_func.incr_module_23, a2 }) }
error: Failed to materialize symbols: { (main, { __orc_init_func.incr_module_23 }) }
The error message ("out of range of Delta32") appears similar to #102858, another Interpreter test that is flaky with ASan.
```
Recent test history on the x86_64-linux-fast bot:
- https://lab.llvm.org/buildbot/#/builders/169/builds/10339: fail
- 10340: buildbot logistical problem
- https://lab.llvm.org/buildbot/#/builders/169/builds/10341: fail
- https://lab.llvm.org/buildbot/#/builders/169/builds/10342: fail
- 10343: pass
- 10344: pass
- https://lab.llvm.org/buildbot/#/builders/169/builds/10345: fail
- 10346: pass
...
The OpenMP runtime needs the base address of the array section to
identify the dependency.
If we just put the vector subscript through the usual HLFIR expression
lowering, that would generate a new contiguous array representing the
values of the elements in the array which was sectioned. We cannot use
addresses from this array because these addresses would not match
dependencies on the original array. For example
```
integer :: array(1024)
integer :: indices(2)
indices(1) = 1
indices(2) = 100
!$omp task depend(out: array(1:512))
!$omp end task
!$omp task depend(in: array(indices))
!$omp end task
```
This requires taking the lowering path previously only used for ordered
assignments to get the address of the elements in the original array
which were indexed. This is done using `hlfir.elemental_addr`. e.g.
```
array(indices) = 2
```
`hlfir.elemental_addr` is awkward to use because it (by design) doesn't
return something like `!hlfir.expr<>` (like `hlfir.elemental`) and so it
can't have a generic lowering: each place it is used has to carefully
inline the contents of the operation and extract the needed address.
For this reason, `hlfir.elemental_addr` is not allowed outside of these
ordered assignments. In this commit I ignore this restriction so that I
can use `hlfir.elemental_addr` to lower the OpenMP DEPEND clause (this
works because the operation is inlined and removed before the verifier
runs).
One alternative solution would have been to provide my own more limited
re-implementation of `HlfirDesignatorBuilder` which skipped
`hlfir::elemental_addr`, instead inlining its body directly at the
current insertion point applying indices only for the first element.
This would have been difficult to maintain because designation in
Fortran is complex.
OpenACC declare statements are restricted from having having clauses
that reference assumed size arrays. It should be the case that we can
implement `deviceptr` and `present` clauses for assumed-size arrays.
This is a first step towards relaxing this restriction.
Note running flang on the following example results in an error in
lowering.
```
$ cat t.f90
subroutine vadd (a, b, c, n)
real(8) :: a(*), b(*), c(*)
!$acc declare deviceptr(a, b, c)
!$acc parallel loop
do i = 1,n
c(i) = a(i) + b(i)
enddo
end subroutine
$ flang -fopenacc -c t.f90
error: loc("/home/akuhlenschmi/work/p4/ta/tests/openacc/src/t.f90":3:7): expect declare attribute on variable in declare operation
error: Lowering to LLVM IR failed
error: loc("/home/akuhlenschmi/work/p4/ta/tests/openacc/src/t.f90":4:7): unsupported OpenACC operation: acc.private.recipe
error: loc("/home/akuhlenschmi/work/p4/ta/tests/openacc/src/t.f90":4:7): LLVM Translation failed for operation: acc.private.recipe
error: failed to create the LLVM module
```
I would like to to share this code, because others are currently working
on the implementation of `deviceptr`, but it is obviously not running
end-to-end. I think the cleanest approach to this would be to put this
exception to the rule behind some feature flag, but I am not certain
what the precedence for that is.
Fixes issue #132739.
CaptureAnalysis only considers captures through the def-use chain of the
provided pointer, explicitly excluding captures of underlying values or
implicit captures like those involving external globals.
The previous comment for `PointerMayBeCaptured` did not clearly state
this limitation, leading to its incorrect usage in files such as
ThreadSanitizer.cpp and SanitizerMetadata.cpp.
This PR addresses this by refining the comments for the relevant APIs
within `PointerMayBeCaptured` to explicitly document this behavior.
- Change various Inst/Asm Printer functions to use a StringRef for the
Modifier parameter (instead of a const char *).
- This simplifies various string comparisons used within these
functions.
- Remove these params for print functions that do not use them.
Nearly, but not all, other compilers have a blanket prohibition against
the use of an INTENT(OUT) dummy argument in a specification expression.
Some compilers, however, permit an INTENT(OUT) dummy argument to appear
in a specification expression in a BLOCK construct or inner procedure
via host association.
The argument some have put forth to accept this usage comes from a
reading of 10.1.11 (specification expressions) in Fortran 2023 that, if
followed consistently, would also require host-associated OPTIONAL dummy
argument to be allowed. That would be dangerous for reasons that should
be obvious.
However, I can agree that a non-OPTIONAL dummy argument can't be assumed
to remain undefined on entry to a BLOCK construct or inner procedure, so
we can accept host-associated INTENT(OUT) in specification expressions
with a portability warning.
The logic for fixed form compiler directive line continuation has a hole
that can apply continuation for !$ even if the next line does not begin
with a fixed form comment character. Rearrange the nested if statements
to enforce that requirement for all compiler directives.
This patch extends the canonicalization printing policy to cover
expressions
and template names, and wires that up to the template argument printer,
covering expressions, and to the expression within a dependent decltype.
This is helpful for debugging, or if these expressions somehow end up
in diagnostics, as without this patch they can print as completely
unrelated
expressions, which can be quite confusing.
This is because expressions are not uniqued, unlike types, and
when a template specialization containing an expression is the first to
be
canonicalized, the expression ends up appearing in the canonical type of
subsequent equivalent specializations.
Fixes https://github.com/llvm/llvm-project/issues/92292
Recent work to better handle macro replacement in literal constant kind
suffixes isn't handling fixed form well, leading to a crash in Fujitsu
test 0113/0113_0073.F. The look-ahead needs to be done with the
higher-level prescanner functions that skip over fixed form comment
fields after column 72. Rework.
The current module file documentation antedates the current
implementation of module files and contains many aspirational and
conditional statements, all of which can now be resolved with
descriptions of how things actually work.
In HIP, the Clang driver already sets `force-import-all` when ThinLTO is
enabled. As a result, all imported functions get the
`available_externally`
linkage. However, these functions are later removed by the
`EliminateAvailableExternallyPass`, effectively undoing the forced
import and
eventually leading to link errors.
The `EliminateAvailableExternallyPass` provides an option to convert
`available_externally` functions into local functions, renaming them to
avoid
conflicts. This behavior is exactly what we need for HIP. This PR
enables that
option (`avail-extern-to-local`) alongside `force-import-all` when
ThinLTO is
used.
With this change, ThinLTO almost works correctly on AMDGPU. The only
remaining
issue is an undefined reference to `__assert_fail`, but that falls
outside the
scope of this PR.
Adding support for cancelling requests.
There are two forms of request cancellation.
* Preemptively cancelling a request that is in the queue.
* Actively cancelling the in progress request as a best effort attempt
using `SBDebugger.RequestInterrupt()`.
Currently we don't check for the presence of descriptor/BoxTypes before
emitting stores which lower to memcpys, the issue with this is that
users can have optional arguments, where they don't provide an input,
making the argument effectively null. This can still be mapped and this
causes issues at the moment as we'll emit a memcpy for function
arguments to store to a local variable for certain edge cases, when we
perform this memcpy on a null input, we cause a segfault at runtime.
The fix to this is to simply create a branch around the store that
checks if the data we're copying from is actually present. If it is, we
proceed with the store, if it isn't we skip it.
Considering that "or disjoint" is the canonical for certain add
operations, then I think we want to support such "add like" operations
when doing ADD+GEP->GEP+GEP rewrites to make things more consistent.
Problem was found when improving ValueTracking, which turned an ADD into
OR, and then suddenly optimizations got worse due to these rewrites no
longer triggering.
Fix Flang invocation in `Driver/do_concurrent_to_omp_cli.f90` test to
run compilation step only, to fix testing when building with
`-DFLANG_INCLUDE_RUNTIME=OFF`. The test is only concerned with warning
being emitted by the compiler, so there is no need to link the resulting
executable.
This is a follow-up to #132748, where we deferred the flag removal in
order to ease transition for external users.
The plan is to merge this in the nearish future, in two weeks or so is
my best guess.
BuildAttributeSubSection::Name must be a std::string instead of
StringRef because it may be assigned from non-persistent memory.
StringRef is non-owning and unsafe in this context. This change ensures
the subsection name owns its memory, preventing use-after-free or
dangling references.
Context: Work in progress in PR #131990.
`weightCalcHelper` is responsible for adding hints to MRI. Prior to this
PR, we fell back on register ID as the last tie breaker for sorting
hints. However, there is an opportunity to add an additional sorting
characteristic: whether or not a register is a callee-saved-register.
I thought of this idea because I saw that `AllocationOrder::create`
calls `RegisterClassInfo::getOrder`, which returns a list of registers
such that the registers which alias callee-saved-registers come last.
From this, I conclude that the register allocator prefers an order such
that callee-saved-registers are allocated after
non-callee-saved-registers to avoid having to spill the CSR.
This sorting characteristic occurs only as a tie breaker to the Weight
calculation. This is a good idea since the weight calculation is pretty
complex and I'm sure it is a pretty stable metric. I think its pretty
reasonable to agree that whether a register is callee-saved or not is a
better tie breaker than register ID. I think this is evident by the test
diff, since the changes all seem to have no impact or improve the
register allocation.
This patch changes how ZA array is modelled at LLVM-IR level. Currently
accesses to ZA are represented at LLVM-IR level as memory reads and
writes and at instruction level as unmodeled side-effects. This patch
changes that and models them as purely Inaccessible memory accesses
without any unmodeled side-effects.
In libclc, we observe that compiling OpenCL source files to bitcode is
executed sequentially on Windows, which increases debug build time by
about an hour.
add_custom_command may introduce additional implicit dependencies, see
https://gitlab.kitware.com/cmake/cmake/-/issues/17097
This PR adds a target for each command, enabling parallel builds of
OpenCL source files.
CMake 3.27 has fixed above issue with DEPENDS_EXPLICIT_ONLY. When LLVM
upgrades cmake vertion to 3.7, we can switch to DEPENDS_EXPLICIT_ONLY.
Finding operator delete[] is still problematic, without it the extension
is a security hazard, so reverting until the problem with operator
delete[] is figured out.
This reverts the following PRs:
Reland [MS][clang] Add support for vector deleting destructors (llvm#133451)
[MS][clang] Make sure vector deleting dtor calls correct operator delete (llvm#133950)
[MS][clang] Fix crash on deletion of array of pointers (llvm#134088)
[clang] Do not diagnose unused deleted operator delete[] (llvm#134357)
[MS][clang] Error about ambiguous operator delete[] only when required (llvm#135041)
After implementing CFI instructions in the function prologue, LLDB
testing for RISC-V started failing due to insufficient relocations
(e.g., R_RISCV_SET8, R_RISCV_SET16).
This patch adds support for the necessary RISC-V relocations in MCJIT.
We use the term "interchangeable instructions" to refer to different
operators that have the same meaning (e.g., `add x, 0` is equivalent to
`mul x, 1`).
Non-constant values are not supported, as they may incur high costs with
little benefit.
---------
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
The propagateStoredValuesToLoads() transform currently bails out if
there is a lifetime intrinsic spanning the whole alloca, but the
individual loads/stores operate on some smaller part, because the slice
/ partition size does not match.
Fix this by ignoring assume-like slices early, regardless of which range
they cover.
I've changed the overall code structure here a bit because I was getting
confused by the different iterators.
omp.cancel and omp.cancellationpoint contain an attribute describing the
type of parent construct which should be cancelled. e.g.
```
!$omp cancel do
```
Must be inside of a wsloop. Previously the verifer required the
immediate parent to be this operation. This is not quite right because
something like the following is valid:
```
!$omp parallel do
do i = 1, N
if (cond) then
!$omp cancel do
endif
enddo
```
This patch relaxes the verifier to only require that some parent
operation matches (not necessarily the immediate parent).
llvm-diff shows there is no change to amdgcn--amdhsa.bc.
Similar to how cl_khr_fp64 and cl_khr_fp16 implementations are put in a
same file for math built-ins, this PR do the same to atom_* built-ins.
The main motivation is to prevent that two files with same base name
implementats different built-ins. In a follow-up PR, I'd like to relax
libclc_configure_lib_source to only compare filename instead of path for
overriding, since in our downstream the same category of built-ins, e.g.
math, are organized in several different folders.