When `PINNED=` is used with variables that don't have the `PINNED`
attribute, the logical value must be set to false when host allocation
is performed.
Introduce cuf.sync_descriptor to be used to sync device global
descriptor after pointer association.
Also move CUFCommon so it can be used in FIRBuilder lib as well.
This is a patch in preparation for the support stream ordered memory
allocator in CUDA Fortran.
This patch adds an asynchronous id to the AllocatableAllocate runtime
function and to Descriptor::Allocate so it can be passed down to the
registered allocator. It is up to the allocator to use this value or
not.
A follow up patch will implement that asynchronous allocator for CUDA
Fortran.
ALLOCATE and DEALLOCATE statements can be inlined in device function.
This patch updates the condition that determined to inline these actions
in lowering.
This avoid runtime calls in device function code and can speed up the
execution.
Also move `isCudaDeviceContext` from `Bridge.cpp` so it can be used
elsewhere.
This fixes a bug in OpenMP privatisation. The privatised variables are
created as though they are host associated clones of the original
variables. These privatised variables do not contain the allocatable
attribute themselves and so we need to check if the ultimate symbol is
allocatable. Having or not having this flag influences whether lowering
determines that this is a whole allocatable assignment, which then
causes hlfir.assign not to get the realloc flag, which cases the
allocatable not to be allocated when it is assigned to (leading to a
segfault running the newly added test).
I also did the same for pointer variables because I would imagine they
could experience the same issue.
There is no fallout on tests outside of OpenMP, and the gfortran test
suite still passes, so I think this doesn't break host other kinds of
host associated symbols.
Lower allocatable and pointers specification parts. Nothing special is
required to allocate the descriptor given they are required to be dummy
arguments, however, care must be taken with INTENT(OUT) to use the
runtime to deallocate them (inlined fir.embox + store is not possible).
The number of operations dedicated to CUF grew and where all still in
FIR. In order to have a better organization, the CUF operations,
attributes and code is moved into their specific dialect and files. CUF
dialect is tightly coupled with HLFIR/FIR and their types.
The CUF attributes are bundled into their own library since some
HLFIR/FIR operations depend on them and the CUF dialect depends on the
FIR types. Without having the attributes into a separate library there
would be a dependency cycle.
…ted. (#89998)" (#90250)
This partially reverts commit 7aedd7dc754c74a49fe84ed2640e269c25414087.
This change removes calls to the deprecated member functions. It does
not mark the functions deprecated yet and does not disable the
deprecation warning in TypeSwitch. This seems to cause problems with
MSVC.
Automatic deallocation of allocatable that are cuda device variable must
use the fir.cuda_deallocate operation. This patch update the automatic
deallocation code generation to use this operation when the variable is
a cuda variable.
This patch has also the side effect to correctly call
`attachDeclarePostDeallocAction` for OpenACC declare variable on
automatic deallocation as well. Update the code in
`attachDeclarePostDeallocAction` so we do not attach on fir.result but
on the correct last op.
Automatic deallocation of allocatable that are cuda device variable must
use the fir.cuda_deallocate operation. This patch update the automatic
deallocation code generation to use this operation when the variable is
a cuda variable.
Replace the runtime call to `AllocatableDeallocate` for CUDA device
variable to the newly added `fir.cuda_deallocate` operation.
This is similar with #88980
A third patch will handle the case of automatic dealloctaion of device
allocatable variables
Allocate statement for variable with CUDA attributes need to allocate
memory on the device and not the host. Add a proper TODO so we keep
track of work to be done for it.
Flang supports source allocation to allocatable or pointers with a non
deferred length that do not match the source length. This documented at:
9708d09003/flang/docs/Extensions.md (L312)
The current lowering code was bugged when such explicit length allocate
object appeared after a deferred length object in the source allocation
list:
Since "lenParams" had been computed when generating allocation of the
deferred length object, the call to genSetDeferredLengthParameters was
not a no-op on when lowering the explicit length allocation, and the
explicit length was overridden with the source length.
The output of the program added in test was:
```
ZZheZZ
ZZhelloZZ
ZZhelloZZ
```
Instead of:
```
ZZheZZ
ZZhelloZZ
ZZhello ZZ
```
Skip genSetDeferredLengthParameters when the allocate object has non
deferred length.
A DEALLOCATE statement on a pointer should always use
PointerDeallocate() in the runtime, even if there's no STAT= or
polymorphism or derived types, so that it can be checked to ensure that
it is indeed a whole allocation of a pointer.
The `acc.declate_action` attribute was sometime misplaced as reported in
#79770.
This patch updates the lowering code to place the
postAllocate/postDeallocate actions at the correct place.
The standard requires a compiler to diagnose an incorrect use of a
pointer in a DEALLOCATE statement. The pointer must be associated with
an entire object that was allocated as a pointer (not allocatable) by an
ALLOCATE statement.
Implement by appending a validation footer to pointer allocations. This
is an extra allocated word that encodes the base address of the
allocation. If it is not found after the data payload when the pointer
is deallocated, signal an error. There is a chance of a false positive
result, but that should be vanishingly unlikely.
This change requires all pointer allocations (not allocatables) to take
place in the runtime in PointerAllocate(), which might be slower in
cases that could otherwise be handled with a native memory allocation
operation. I believe that memory allocation of pointers is less common
than with allocatables, which are not affected. If this turns out to
become a performance problem, we can inline the creation and
initialization of the footer word.
Fixes https://github.com/llvm/llvm-project/issues/78391.
There are currently several places that automatically deallocate
allocatble if they are allocated:
- INTENT(OUT) allocatable are deallocated on entry in the callee
- INTENT(OUT) allocatable are also deallocated on the caller side of
BIND(C) function in case the implementation is in C.
- Results of function returning allocatable are deallocated after usage.
- OPENMP privatized allocatable are deallocated at the end of OPENMP
region.
Introduce genDeallocateIfAllocated that centralize all this code, except
for the function return that use genFreememIfAllocated since
finalization is done separately currently.
`fir:🏭:genFinalization` and
`fir:🏭:genInlinedDeallocation` are removed and replaced by
genFreemem since their name were misleading: finalization was not
called.
There is a fallout in the tests because previous generated code did not
check the allocated status when doing inline deallocation. This was OK
since free(null) is guaranteed to be a no-op, but this makes compiler
code more complex, is a bit surprising in the generated IR IMHO, and it
relied on knowing when genDeallocateBox inserts runtime calls or uses
inlined code.
This patches adds the acc.declare_action attrbites on
post allocate operation and pre/post deallocate operations.
Reviewed By: razvanlupusoru
Differential Revision: https://reviews.llvm.org/D157915
Begin upstreaming of CUDA Fortran support in LLVM Flang.
This first patch implements parsing for CUDA Fortran syntax,
including:
- a new LanguageFeature enum value for CUDA Fortran
- driver change to enable that feature for *.cuf and *.CUF source files
- parse tree representation of CUDA Fortran syntax
- dumping and unparsing of the parse tree
- the actual parsers for CUDA Fortran syntax
- prescanning support for !@CUF and !$CUF
- basic sanity testing via unparsing and parse tree dumps
... along with any minimized changes elsewhere to make these
work, mostly no-op cases in common::visitors instances in
semantics and lowering to allow them to compile in the face
of new types in variant<> instances in the parse tree.
Because CUDA Fortran allows the kernel launch chevron syntax
("call foo<<<blocks, threads>>>()") only on CALL statements and
not on function references, the parse tree nodes for CallStmt,
FunctionReference, and their shared Call were rearranged a bit;
this caused a fair amount of one-line changes in many files.
More patches will follow that implement CUDA Fortran in the symbol
table and name resolution, and then semantic checking.
Differential Revision: https://reviews.llvm.org/D150159
Currently, local allocatables and contiguous/scalar pointers (and some
other conditions) are lowered to a set of independent variables in FIR
(one for the address, one for each bound and one for character length).
The intention was to help LLVM get rids of descriptors. But LLVM knows
how to do that anyway in those cases:
```
subroutine foo(x)
real, target :: x(100)
real, pointer, contiguous :: p(:)
p => x
call bar(p(50))
end subroutine
```
The output fir the option on or off is the same after llvm opt -O1,
there is no descriptor anymore, the indirection is removed.
```
define void @foo_(ptr %0) local_unnamed_addr {
%2 = getelementptr [100 x float], ptr %0, i64 0, i64 49
tail call void @bar_(ptr %2)
ret void
}
```
So the benefit of not using a descriptor in lowering is questionable,
and although it is abstracted as much as possible in the so called
MutableBoxValue class that represent allocatable/pointer in lowering
it is still causing bugs from time to time, and will also be a bit
problematic when emitting debug info for the pointer/allocatable.
In HLFIR lowering, the simplification to always use a descriptor in
lowering was already made. This patch allows decorrelating the impact
from this change from the bigger impact HLFIR will have so that it
is easier to get feedback if this causes performance issues.
The lowering tests relying on the previous behavior are only updated
to set back this option to true. The reason is that I think we should
preserve coverage of the code dealing with the "non descriptor" approach
in lowering until we actually get rid of it. The other reason is that
the test will have to be or are already covered by equivalent HLFIR
tests, which use descriptors.
Differential Revision: https://reviews.llvm.org/D148910
Update lowering of allocate statement to use the new
functions defined in D146290.
Depends on D146290
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D146291
We were failing tests where an ALLOCATE statement that allocated an
array had a non-character scalar MOLD argument.
I fixed this by merging the code for ALLOCATE statements with MOLD and
SOURCE arguments.
Differential Revision: https://reviews.llvm.org/D145418
When we allocate a variable using a MOLD argument, the function that
applies the type of the MOLD argument first checks to see if the
variable is already allocated by looking at its descriptor. But in the
case of allocating a scalar, the descriptor was not yet been created and
the associated memory is uninitialized. This change fixes that.
Differential Revision: https://reviews.llvm.org/D144761
Allocation of unlimited polymorphic allocatable with
character intrinsic type is now done through
`PointerNullifyCharacter` or `AllocatableInitCharacter` so the length
is correctly set.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D143580
This pass implements the `-fstack-arrays` flag. See the RFC in
`flang/docs/fstack-arrays.md` for more information.
Differential revision: https://reviews.llvm.org/D140415
The rank from the allocate object might be different from the rank
from the mold expression. Use the rank from the allocate object
when applying to mold so the bounds can be set correctly.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D143078
As Fortran 2018 9.7.1.2(7), the value of each element of allocate object
becomes the value of source when the allocate object is array and the
source is scalar.
Fix#60090.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D142112
Makes use of fir.type_desc in order to delay the type desc address
resolution. The lowering inserts fir.type_desc operation instead of fir.addr_of
operation pointing to the fir.global type descriptor. The fir.type_desc
operation is then lowered in code gen to an address of operation in the LLVM
dialect. At this stage, the type descriptor is generated in all cases.
Reviewed By: vdonaldson
Differential Revision: https://reviews.llvm.org/D142920
The previous patches allowed lowering allocatable/and pointer designator
expressions with HLFIR.
This patch updates the bridge genExprMutableBox to use HLFIR lowering
when HLFIR flag is set. For allocate and deallocate lowering that use
genExprMutableBox, no other change is needed.
For pointer assignments, the code doing the pointer assignments in the
bridge can be reused and is simply moved so that it can be shared, and
the "explicit context" special cases of the previous lowering are
by-passed.
The code doing pointer assignment revealed that convertExprToAddress
did not match the previous genExprAddr behavior (that actually
does not create temps for "x" where x is not contiguous).
Instead of trying to copy the old behavior that is a bit weird (was
dictated by the implementation rather than design). Update
convertExprToAddress to do something sensible and that works with
the current genExprAddr usages (if anything, it should saves bogus
array section temps).
Differential Revision: https://reviews.llvm.org/D142197
Adds support for:
- referencing a whole allocatable/pointer symbol
- passing allocatable/pointer in a call
This required update in HLFIRTools.cpp helpers so that the
raw address, extents, lower bounds, and type parameters of a
fir.box/fir.class can be extracted.
This is required because in hlfir lowering, dereferencing a
pointer/alloc is only doing the fir.load fir.box part, and the
helpers have to be able to reason about that fir.box without the
help of a "fir::FortranVariableOpInterface".
Missing:
- referencing part of allocatable/pointer (will need to update
Designator lowering to dereference the pointer/alloc). Same
for whole allocatable and pointer components.
- allocate/deallocate/pointer assignment statements.
- Whole allocatable assignment.
- Lower inquires.
Differential Revision: https://reviews.llvm.org/D142043
Source allocation is similar to mold allocation + assignment. Use
ApplyMold runtime entry point for polymorphic source allocation.
It could be generalized for other source allocation.
Reviewed By: jeanPerier, PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D141996
Apply the source type spec to the descriptor for
polyrmophic entities.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D141822
Lower allocate statement with MOLD= to calls to the Fortran
runtime. PointerApplyMold and AllocatableApplyMold are called
depending on the object to be allocated.
Reviewed By: jeanPerier, PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D141843
Support allocate statement with source in runtime version. The source
expression is evaluated only once for each allocate statement. When the
source expression has shape-spec, uses it for bounds. Otherwise, get
the bounds from the source expression. Get the length if the source
expression has deferred length parameter.
Reviewed By: clementval, jeanPerier
Differential Revision: https://reviews.llvm.org/D137812
As mentioned in section 7.3.2.3 note 7, The dynamic type of an unallocated
allocatable object or a disassociated pointer is the same as its declared type.
This patch adds two function to the runtime:
- `PointerDeallocatePolymorphic`
- `AllocatableDeallocatePolymorphic`
These two functions take a DerivedTypeDesc pointer of the declared type.
The lowering is updated accordingly to call these functions for polymorphic
and unlimited polyrmophic entities. For unlimited polymorphic entities, the
dynamic type is set to nullptr when the entity is on an unallocated or
disassociated state.
Reviewed By: PeteSteinfeld, klausler
Differential Revision: https://reviews.llvm.org/D141519