In device context managed memory is not available so it makes no sense
to allocate the descriptor using it. Fall back to fir.alloca as it is
handled well in device code.
cuf.free is just dropped.
Flang considers arrays in main program larger than 32 bytes having the
SAVE attribute and lowers them as globals. In CUDA Fortran, device
variables are not allowed to have the SAVE attribute and should be
allocated dynamically in the main program scope.
This patch updates lowering so CUDA Fortran device variables are not
considered with the SAVE attribute.
When a local character variable with non-constant length has an
initializer, it's an error in a couple of ways (SAVE variable with
unknown size, static initializer that isn't constant due to conversion
to an unknown length). The error that f18 reports is the latter, but the
message contains a formatted representation of the initialization
expression that exposes a non-Fortran %SET_LENGTH() operation. Print the
original expression in the message instead.
The check for a structure constructor to a forward-referenced derived
type wasn't tripping for constructors in the type definition itself. Set
the forward reference flag unconditionally at the beginning of name
resolution for the type definition.
FindPolymorphicAllocatableUltimateComponent needs to be
FindPolymorphicAllocatablePotentialComponent. The current search is
missing cases where a derived type has an allocatable component whose
type has a polymorphic allocatable component.
There's a numbered constraint that prohibits calls to some IEEE
arithmetic and exception procedures within the body of a DO CONCURRENT
construct. Clean up the implementation to catch missing cases.
The start, end, and stride expressions of a concurrent-header in a DO
CONCURRENT or FORALL statement can contain calls to impure functions...
unless they appear in a statement that's nested in an enclosing DO
CONCURRENT or FORALL construct. Ensure that we catch this nested case.
…DATA
We allow automatic data objects in the specification part of the main
program; add an optional portability warning and documentation. Don't
allow them in BLOCK DATA. They're already disallowed as module
variables.
The derived type compatibility checking for ALLOCATE statements with
SOURCE= or MOLD= was only checking for the same derived type name. That
is a necessary but not sufficient check, and it can produce bogus errors
as well as miss valid errors.
Fixes https://github.com/llvm/llvm-project/issues/101909.
External procedures about which no characteristics are known -- from
EXTERNAL and PROCEDURE() statements of entities that are never called --
are marked as subroutines. This shouldn't be done for procedure
pointers, however.
Fixes https://github.com/llvm/llvm-project/issues/101908.
Fortran optional arguments are effectively null references. To deal with
this possibility, flang lowering of OpenACC data clauses creates three
if-else regions when preparing the data pointer for the data clause:
1) Load box value from box reference
2) Load box addr from box value
3) Load box dims from box value
However, this pattern makes it more complicated to find the original box
reference. Effectively, the first if-else region to get the box value is
not needed - since the value can be loaded before the corresponding
`fir.box_addr` and `fir.box_dims` operations. Thus, reduce the number of
if-else regions by deferring the box load to the use sites.
For non-optional cases, the old functionality is left alone - which
preloads the box value.
Currently, `%17 = fir.box_elesize %16 :
(!fir.class<!fir.ptr<!fir.type<_QFTt{a:i32,b:i32}>>>) -> i32`
is translated to
```
%4 = getelementptr { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, ptr %1, i32 0, i32 1
%5 = load i32, ptr %4, align 4
```
The type of the element size is `i64`. The load essentially truncates
the value and yields incorrect result in the big endian environment. The
problem occurs in the `storage_size` intrinsic on a polymorphic
variable.
This patch adds entry point in the runtime to be able to allocate
descriptors in managed memory. These entry points currently only call
`CUFAllocManaged` and `CUFFreeManaged` but could be more complicated in
the future.
`cuf.alloc` and `cuf.free` related to local descriptors are converted
into runtime calls.
Allocator can be specified in the descriptor. For simple local
allocatable, we can simply convert `cuf.allocate`/`cuf.deallocate` to
their corresponding runtime calls in the standard flang runtime. More
specific cases will require dedicated entry points. Global descriptor
will require sync between host and device copy.
This patch adds a pass to perform this conversion.
The generation of 80-bit x87 floating-point infinities was incorrect in
Normalize(), the comparison for IEEE_NEXT_AFTER needs to use the most
precise type of its arguments, and we don't need to warn about overflows
from +/-HUGE() to infinity. Warnings about NaN arguments remain in
place, and enabled by default, as their usage may or may not be
portable, and their appearance in a real code seems most likely to
signify an earlier error.
This patch modifies MLIR to LLVM IR lowering of the OpenMP dialect to take into
consideration the contents of the `omp.target_triples` module attribute while
generating code for `omp.target` operations.
It adds the `OpenMPIRBuilderConfig::TargetTriples` field and initializes it
using the `amendOperation` flow of the `OpenMPToLLVMIRTranslation` pass. Some
changes are introduced into the `OpenMPIRBuilder` to allow passing the
information about whether a target region is intended to be offloaded from
outside.
The result of this change is that offloading calls are only generated when the
`--offload-arch` or `-fopenmp-targets` options are given to the compiler.
Otherwise, only the host fallback code is generated. This fixes linker errors
currently triggered by `flang-new` if a source file containing a `target`
construct is compiled without any of the aforementioned options.
Several unit tests impacted by these changes, which are intended to check host
code generated for `omp.target` operations, are updated to contain the new
attribute. Without it, no calls to `__tgt_target_kernel` and associated control
flow operations are generated.
Fixes#100209.
This patch adds support for the `-fopenmp-targets` option to the `bbc`
and `flang -fc1` tools. It adds an `OMPTargetTriples` property to the
`LangOptions` structure, which is filled with the triples represented by
the compiler option.
This is used to initialize the `omp.target_triples` module attribute for
later use by lowering stages.
Flips the delayed privatization switch to be on by default. After the
recent fixes related to delayed privatization, the gfortran test suite
runs successfully with delayed privatization turned on by defuault for
`omp parallel`.
#100690 introduces allocator registry with the ability to store
allocator index in the descriptor. This patch adds an attribute to
fir.embox and fircg.ext_embox to be able to set the allocator index
while populating the descriptor fields.
This patch enhances the descriptor with the ability to have specialized
allocator. The allocators are registered in a dedicated registry and the
index of the desired allocator is stored in the descriptor. The default
allocator, std::malloc, is registered at index 0.
In order to have this allocator index in the descriptor, the f18Addendum
field is repurposed to be able to hold the presence flag for the
addendum (lsb) and the allocator index.
Since this is a change in the semantic and name of the 7th field of the
descriptor, the CFI_VERSION is bumped to the date of the initial change.
This patch only adds the ability to have this features as part of the
descriptor but does not add specific allocator yet. CUDA fortran will be
the first user of this feature to allocate descriptor data in the
different type of device memory base on the CUDA attribute.
---------
Co-authored-by: Slava Zakharin <szakharin@nvidia.com>
This patch modifies the flang driver to introduce the `-fopenmp-targets`
option to the frontend compiler invocations corresponding to the OpenMP
host device on offloading-enabled compilations.
This option holds the list of offloading triples associated to the
compilation and is used by clang to determine whether offloading calls
should be generated for the host.
Handles variables that are storage associated via `equivalence`. The
problem is that these variables are declared as `fir.ptr`s while their
privatized storage is declared as `fir.ref` which was triggering a
validation error in the OpenMP dialect.
There are some cases in which variables used in OpenMP constructs
are predetermined as private. The semantic checks for copyprivate
were not handling those cases.
Besides that, shared symbols were not being properly represented
in some cases. When there was no previously declared private
(implicit) symbol, no new association symbols, representing
shared ones, were being created.
These symbols must always be inserted in constructs that may
privatize the original symbol: parallel, teams and task
generating constructs.
Fixes#87214 and #86907
This patch sorts the clause lists for the following OpenMP operations:
- omp.taskloop
- omp.taskgroup
- omp.target_data
- omp.target_enter_data
- omp.target_exit_data
- omp.target_update
- omp.target
This change results in the reordering of operation arguments, so
impacted unit tests are updated accordingly.
This patch sorts the clause lists for the following OpenMP operations:
- omp.parallel
- omp.teams
- omp.sections
- omp.wsloop
- omp.distribute
- omp.task
This change results in the reordering of operation arguments, so
impacted unit tests are updated accordingly.
When there's an error in a SUBROUTINE or FUNCTION statement, errors
cascade quickly because the body of the subprogram or interface isn't in
the right context. So, if a SUBROUTINE or FUNCTION statement is
expected, and contains a SUBROUTINE or FUNCTION keyword, it counts as
one -- retain and emit any errors pertaining to the arguments or suffix,
recover to the end of the line if needed, and proceed.
The code that deals with the special case of RANK(assumed-rank) in
intrinsic function folding wasn't handling the even more special case of
assumed-type assumed-rank dummy arguments.
A couple of intrinsic functions have optional arguments. Don't insert
type conversions on those arguments when the actual arguments may not be
present at execution time, due to being OPTIONAL, allocatables, or
pointers.
When a declaration construct appears in the execution part of a block or
subprogram body, report it as such rather than as a misleading syntax
error on the executable statement that it somehow matched the most.
…tion
See new test. A #line (or #) directive after a line ending with & and
before its continuation shouldn't elicit an error about mismatched
parentheses.
Fixes https://github.com/llvm/llvm-project/issues/100073.
Ensure that type parameters are declared as such before being referenced
within the derived type definition. (Previously, such references would
resolve to symbols in the enclosing scope.)
This change causes the symbols for the type parameters to be created
when the TYPE statement is processed in name resolution. They are
TypeParamDetails symbols with no KIND/LEN attribute set, and they shadow
any symbols of the same name in the enclosing scope.
When the type parameter declarations are processed, the KIND/LEN
attributes are set. Any earlier reference to a type parameter with no
KIND/LEN attribute elicits an error.
Some members of TypeParamDetails have been retyped &/or renamed.
When the result of a function never appears in a variable definition
context, emit a warning.
If the function has multiple result variables due to alternate ENTRY
statements, any definition will suffice.
The implementation of this check is tied to the general variable
definability checking utility in semantics. Every variable definition
context uses it to ensure that no undefinable variable is being defined.
A set of defined variables is maintained in the SemanticsContext and,
when the warning is enabled and no fatal error has been reported, the
scope tree is traversed and all the function subprograms' results are
tested for membership in that set.
IEEE_ARITHMETIC intrinsic module procedures IEEE_NEXT_AFTER,
IEEE_NEXT_DOWN, and IEEE_NEXT_UP, and intrinsic NEAREST return larger or
smaller values adjacent to their primary REAL argument. The four
procedures vary in how the direction is chosen, in how special cases are
treated, and in what exceptions are generated. Implement the three
IEEE_ARITHMETIC procedures. Update the NEAREST implementation to support
all six REAL kinds 2,3,4,8,10,16, and fix several bugs.
IEEE_NEXT_AFTER(X,Y) returns a NaN when Y is a NaN as that seems to be
the universal choice of other compilers.
Change the front end compile time implementation of these procedures to
return normal (HUGE) values for infinities when applicable, rather than
always returning the input infinity.
Flang-new needs to add `mlink-builtin-bitcode` objects to properly
support offload code generation for AMD GPUs (for example, math
functions).
Both Flang-new and Clang rely on `mlink-builtin-bitcode` flags. These
flags are added by the `AMDGPUOpenMPToolchain::addClangTargetOptions`
function. Now, both compilers reuse the same function.
Flang-new tests for AMDGPU were updated by adding the `-nogpulib` flag.
This flag allows running AMDGPU tests on machines without the ROCm stack.
This PR implements `ComputeRegionOpInterface` to define `getAllocaBlock`
of OpenACC loop and compute constructs (parallel/kernels/serial). The
primary objective here is to accommodate local variables in OpenACC
compute regions. The change in `fir::FirOpBuilder::getAllocaBlock`
allows local variable allocation inside loops and kernels.
The functions internal to subroutine should have the scope set to the
parent function. This allows a user to evaluate local variables of
parent function when control is stopped in the child.
Fixes#96314
Config files provide a facility to invoke the compiler with a predefined
set of options. The patch only enables these options in the flang
driver. Functionality was always there.
Functions returning C_PTR were lowered to function returning intptr (i64
on 64bit arch). This caused conflicts when these functions were defined
as returning !fir.ref<none>/llvm.ptr in other compiler generated
contexts (e.g., malloc).
Lower them to return !fir.ref<none>.
This should deal with https://github.com/llvm/llvm-project/issues/97325
and https://github.com/llvm/llvm-project/issues/98644.
Summary:
These tests were removed in a previous patch.
The linker wrapper now just extracts the device inputs and forwards them
directly to the device's link job. This is the job that occurs when you
do `clang --target=amdgcn-amd-amdhsa foo.o` or similar. Because this can
handle LTO we no longer do LTO in the linker wrapper. This has some
fallout, because we now require `ld.lld` to be built with a compatible
version, but I think we always expected that.
I made the decision to remove this `libc-gpu.a` library because it was
unnecessary and complicated things. Now I simply have the link job
implicitly link `-lc` if it exists. Users can also now pass
`-Xoffload-linker=amdgcn-amd-amdhsa -lc` or similar to pass it. Because
of this, these tests need to be removed. I forgot that Fortran also had
these.