I am having problems building Fortran runtime for CUDA
after #134164. I need more time to investigate it,
but in the meantime including variant.h (or any header
that eventually includes a libcudacxx header) resolves
the issue.
In `OutputSegment.cpp`, we need to ensure a specific order for certain
sections. The current sorting logic incorrectly prioritizes code
sections over explicitly defined section orders. This is problematic
because the `__objc_stubs` section is both a code section *and* has a
specific ordering requirement. The current logic would incorrectly
prioritize its code section status, causing it to be sorted *before* the
`__stubs` section. This incorrect ordering breaks the branch extension
algorithm, ultimately leading to linker failures due to relocation
errors.
We also modify the `lld/test/MachO/arm64-objc-stubs.s` test to ensure
correct section ordering.
Add ObjectFile::GetObjectName and SymbolFile::GetObjectName to retrieve
the name of the object file, including the `.a` for static libraries.
We currently do something similar in CommandObjectTarget, but the code
for dumping this is a lot more involved than what's being offered by the
new method. We have options to print he full path, the base name, and
the directoy of the path and trim it to a specific width.
This is motivated by #133211, where Greg pointed out that the old code
would print the static archive (the .a file) rather than the actual
object file inside of it.
Fixes#102053
The check was added in 8decdc472f308b13d7fb7fd50c3919db086c0417, and at
the time iOS 5 was the latest iOS version, before that commit tail calls
were disabled for all ARMv7 targets. Testing a build of wasm3 with the
patch on a device running iOS 3.0 shows a noticeable performance
improvement and no issues.
This adds ClangIR support for break and continue statements in loops.
Because only loops are currently implemented upstream in CIR, only
breaks in loops are supported here, but this same code will work (with
minor changes to the verification and cfg flattening) when switch
statements are upstreamed.
https://github.com/llvm/llvm-project/pull/132883 added support for cuda
surfaces but reached into clang/test/Headers/ from clang/test/CodeGen/
to grab the minimal cuda.h. Duplicate that file instead based on
comments in the review, to fix remote test runs.
Signed-off-by: Austin Schuh <austin.linux@gmail.com>
Almost all of the rotate idioms that are valid for an 'or' are also
valid when the halves are combined with an 'add'. Further, many of these
cases are not handled by common bits tracking meaning that the 'add' is
not converted to a 'disjoint or'.
Allocatable or pointer module variables with the CUDA managed attribute
are defined with a double descriptor. One on the host and one on the
device. Only the data pointed to by the descriptor will be allocated in
managed memory.
Allow the registration of any allocatable or pointer module variables
like device or constant.
- when developing the RootSignatureLexer library, we are creating new
files so we should set the standard to adhere to the coding conventions
for function naming
- this was missed in the initial review but caught in the review of the
parser pr
[here](https://github.com/llvm/llvm-project/pull/133302#discussion_r2017632092)
Co-authored-by: Finn Plummer <finnplummer@microsoft.com>
Implement extended intrinsic PUTENV, both function and subroutine forms.
Add PUTENV documentation to flang/docs/Intrinsics.md. Add functional and
semantic unit tests.
The main issue was that the kernel expected `suseconds_t` to be 64 bits
but ours was 32. This caused inconsistent failures since all valid
`suseconds_t` values are less than 1000000 (1 million), and some
configurations caused `struct timeval` to be padded to 128 bits.
Also: forgot to use TEST_FILE instead of FILE_PATH in some places.
The `lldbassert` macro in LLDB behaves like a regular `assert` when
assertions are enabled, and otherwise prints a pretty backtrace and
prompts the user to file a bug. By default, this is emitted as a
diagnostic event, but vendors can provide their own behavior, for
example pre-populating a bug report.
Recently, we ran into an issue where an `lldbassert` (in the Swift
language plugin) would fire excessively, to the point that it was
interfering with the usability of the debugger.
Once an `lldbasser` has fired, there's no point in bothering the user
over and over again for the same problem. This PR solves the problem by
introducing a static `std::once_flag` in the macro. This way, every
`lldbasser` will fire at most once per lldb instance.
rdar://148520448
Spimm in the spec refers to the 2-bit encoded value. All of the code
uses the 0, 16, 32, or 48 adjustment value.
Also remove the decodeZcmpSpimm as its identical to the default
behavior for no custom DecoderMethod.
OpenACC 3.3-NEXT has changed the way tags for copy, copyin, copyout, and
create clauses are specified, and end up adding a few extras, and
permits them as a list. This patch encodes these as bitmask enum so
they can be stored succinctly, but still diagnose reasonably.
The code sanitizer is failing with this error: `Execution cannot reach
this statement.`
The execution code path would early exit at line 928 if `(Lil && Ril) =
true`.
There are some system libraries such as sqlite3 which forward declare a
struct then use a pointer to that forward declared type in various APIs.
Ignore these types ForwardDeclChecker like other pointer types.
We were previously telling the user how many arguments were passed to
the attribute rather than saying how many arguments were expected to be
passed to the callback function. This rewords the diagnostic to
hopefully be a bit more clear.
Fixes#47451
When I introduced the various `_LIBCPP_INTRODUCED_IN_LLVM_XY_ATTRIBUTE`
macros in 182f5e9b2f03, I tried to correlate them to the right OS
versions, but it seems that I made a few mistakes. This wasn't caught in
the CI because we don't test back-deployment that far.
rdar://148405946
TLS relocations may not have a valid BOLT symbol associated with them.
While symbolizing the operand, we were checking for the symbol value,
and since there was no symbol the check resulted in a crash.
Handle TLS case while performing operand symbolization on AArch64.
InstCombine will combine this zext of an icmp where the source has a
single bit set to a lshr plus trunc
(`InstCombinerImpl::transformZExtICmp`):
```llvm
define <vscale x 1 x i8> @f(<vscale x 1 x i64> %x) {
%1 = and <vscale x 1 x i64> %x, splat (i64 8)
%2 = icmp ne <vscale x 1 x i64> %1, splat (i64 0)
%3 = zext <vscale x 1 x i1> %2 to <vscale x 1 x i8>
ret <vscale x 1 x i8> %3
}
```
```llvm
define <vscale x 1 x i8> @reverse_zexticmp_i64(<vscale x 1 x i64> %x) {
%1 = trunc <vscale x 1 x i64> %x to <vscale x 1 x i8>
%2 = lshr <vscale x 1 x i8> %1, splat (i8 2)
%3 = and <vscale x 1 x i8> %2, splat (i8 1)
ret <vscale x 1 x i8> %3
}
```
In a loop, this ends up being unprofitable for RISC-V because the
codegen now goes from:
```asm
f: # @f
.cfi_startproc
# %bb.0:
vsetvli a0, zero, e64, m1, ta, ma
vand.vi v8, v8, 8
vmsne.vi v0, v8, 0
vsetvli zero, zero, e8, mf8, ta, ma
vmv.v.i v8, 0
vmerge.vim v8, v8, 1, v0
ret
```
To a series of narrowing vnsrl.wis:
```asm
f: # @f
.cfi_startproc
# %bb.0:
vsetvli a0, zero, e64, m1, ta, ma
vand.vi v8, v8, 8
vsetvli zero, zero, e32, mf2, ta, ma
vnsrl.wi v8, v8, 3
vsetvli zero, zero, e16, mf4, ta, ma
vnsrl.wi v8, v8, 0
vsetvli zero, zero, e8, mf8, ta, ma
vnsrl.wi v8, v8, 0
ret
```
In the original form, the vmv.v.i is loop invariant and is hoisted out,
and the vmerge.vim usually gets folded away into a masked instruction,
so you usually just end up with a vsetvli + vmsne.vi.
The truncate requires multiple instructions and introduces a vtype
toggle for each one, and is measurably slower on the BPI-F3.
This reverses the transform in RISCVISelLowering for truncations greater
than twice the bitwidth, i.e. it keeps single vnsrl.wis.
Fixes#132245
This is crucial when recovering from fatal loader errors. Without it,
the `Lexer` keeps yielding more tokens and the compiler may access
invalid `ASTReader` state.
rdar://133388373
Currently, these breakpoints are being accumulated every time a new
process if created (e.g. through a `run`). Depending on the
circumstances, the old breakpoints are even left enabled, interfering
with subsequent processes. This is addressed by removing the breakpoints
in ProcessGDBRemote::Clear
Note that these breakpoints are more of a PlatformDarwin thing, so in
the future we should look into moving them there.
No longer require -fopenmp or -fopenacc with -E, unless specific version
number options are also required for predefined macros. This means that
most source can be preprocessed with -E and then later compiled with
-fopenmp, -fopenacc, or neither.
This means that OpenMP conditional compilation lines (!$) are also
passed through to -E output. The tricky part of this patch was dealing
with the fact that those conditional lines can also contain regular
Fortran line continuation, and that now has to be deferred when !$ lines
are interspersed.