A defined assignment generic interface for a given LHS/RHS type & rank
combination may have a specific procedure with LHS dummy argument that
is neither allocatable nor pointer, or specific procedure(s) whose LHS
dummy arguments are allocatable or pointer. It is possible to have two
specific procedures if one's LHS dummy argument is allocatable and the
other's is pointer.
However, the runtime doesn't work with LHS dummy arguments that are
allocatable, and will crash with a mysterious "invalid descriptor" error
message.
Extend the list of special bindings to include
ScalarAllocatableAssignment and ScalarPointerAssignment, use them when
appropriate in the runtime type information tables, and handle them in
Assign() in the runtime support library.
Specification expressions may contain references to dummy arguments,
host objects, module variables, and variables in COMMON blocks, since
they will have values on entry to the scope. A local variable with a
initializer and the SAVE attribute (which will always be implied by an
explicit initialization) will also always work, and is accepted by at
least one other compiler, so accept it with a warning.
This was a subtle problem. When the shape of a function result is
explicit but not constant, it is characterized with bounds expressions
that use Extremum<SubscriptInteger> operations to force extents to 0
rather than be negative. These Extremum operations are formatted as
"max()" intrinsic functions in the module file. Upon being read from the
module file, they are not folded back into Extremum operations, but
remain as function references; and this then leads to expressions not
comparing equal when the procedure characteristics are compared to those
of a local procedure declared identically.
The real fix here would be for folding to just always change max and min
function references into Extremum<> operations, constant operands or
not, and I tried that, but it lead to test failures and crashes in
lowering that I couldn't resolve. So, until those can be fixed, here's a
change that will read max/min operations in module file declarations
back into Extremum operations to solve the compatibility checking
problem, but leave other non-constant max/min operations as function
calls.
The standard requires that a generic interface with the same name as a
derived type contain only functions. We generally allow a generic
interface to contain both functions and subroutines, since there's never
any ambiguity at the point of call; these is helpful when the specific
procedures of two generics are combined during USE association. Emit a
warning instead of a hard error when a generic interface with the same
name as a derived type contains a subroutine to improve portability of
code from compilers that don't check for this condition.
When a module file has been compiled with CUDA enabled, don't emit
spurious errors about non-interoperable types when that module is read
by a USE statement in a later non-CUDA compilation.
cuf.data_transfer was wrongly generated when calling the `size`
intrinsic on a device allocatable variable. Since the descriptor is
available on the host, there is no transfer needed.
Add `DescriptorInquiry` in the `CollectCudaSymbolsHelper` to filter out
symbols that are not needed for the transfer decision to be made.
Mostly NFC, I was bothered by the declaration that were always made even
if unsued, and I think using LLVM Ops is nicer anyway with regards to
side effects here.
```
func.func private @llvm.stacksave.p0() -> !fir.ref<i8>
func.func private @llvm.stackrestore.p0(!fir.ref<i8>)
```
There are other places in lowering that are using the calls instead of
the LLVM intrinsics, but I will deal with them another time (the issue
there is mostly to get the proper address space for the llvm.ptr type).
This is an extension of CUDA Fortran. The iso_c_binding intrinsic can
accept a `TYPE(c_devptr)` as its first argument. This patch relax the
semantic check to accept it and update the lowering to unwrap the cptr
field from the c_devptr.
Because of the way visibility is implemented in Options.td, options that
are aliases do not inherit the visibility of the option being aliased.
Therefore, explicitly set the visibility of the alias to be the same as
the aliased option.
This partially addresses
https://github.com/llvm/llvm-project/issues/89888
When using `--save-temps`, flang-new emits (among other things) an
`<input>.i` file. These `.i` files are pre-processed Fortran files
containing information about the modules referenced by the input source
(these files are emitted by: `Parsing::EmitPreprocessedSource`).
This diff allows `.i` files emitted by flang-new to be treated as valid
files in the pre-processing phase. This, in turn, allows flang-new to
add pre-processing options (e.g. `-I`) when launching compilation jobs
for these files.
This solves a bug when using `--save-temps` with source files that
include modules from non-standard directories, for example:
```
flang-new -c --save-temps -I/tmp/module_dir -fno-integrated-as \
/tmp/ModuleUser.f90
```
The problem was that `.i` files were treated as "binary" files and
therefore the return value for `types::getPreprocessedType(InputType)`
in `Flang::ConstructJob(...)` was `types::TY_INVALID`.
Add a builtin type for c_devptr since it will need some special handling
for some function like c_f_pointer.
`c_ptr` is defined as a builtin type and was raising a semantic error if
you try to use it in a I/O statement. This patch add a check for c_ptr
and c_devptr to bypass the semantic check and allow the variables of
these types to be used in I/O.
This version of the patch keeps the semantic error when -pedantic is
enabled to align with gfortran.
This generates `warning: REAL(KIND=16) is not an enabled type for this
target` if that type is used in a build not correctly configured to
support this type. Uses of `selected_real_kind(30)` return -1.
Relanding #102147 because the test errors turned out to be specific to a
downstream configuration.
This patch fixes the mapping and lowering of arrays with dynamic extents
and adds a new test for the same. The fix discards the incomplete the
dynamic extent information and replacing it with just the base type.
When lowering to llvm later, the bounds information is used instead.
Add a builtin type for c_devptr since it will need some special handling
for some function like c_f_pointer.
`c_ptr` is defined as a builtin type and was raising a semantic error if
you try to use it in a I/O statement. This patch add a check for c_ptr
and c_devptr to bypass the semantic check and allow the variables of
these types to be used in I/O.
Codes using traditional C preprocessors will sometimes put a keyword
macro name in a free form continuation line in order to get macro
replacement of part of an identifier, as in
call subr_&
&N&
&(1.)
where N is a keyword macro. f18 already handles this case, but not when
there is white space between the macro name and the following
continuation marker character '&'. Allow white space to appear.
Fixes https://github.com/llvm/llvm-project/issues/106931.
When the implementation of one SMP apparently references another in what
might be a specification expression, semantics may need to resolve it as
a forward reference, and to allow for the replacement of a
SubprogramNameDetails place-holding symbol with the final
SubprogramDetails symbol. Otherwise, as in the bug report below,
confusing error messages may result.
(The reference in question isn't really in the specification part of a
subprogram, but due to the syntactic ambiguity between the array element
assignment statement and a statement function definition, it appears to
be so at the time that the reference is processed.)
I needed to make DumpSymbols() available via SemanticsContext to analyze
this bug, and left that new API in place to make things easier next
time.
Fixes https://github.com/llvm/llvm-project/issues/106705.
Accept non-breaking space characters (Latin-1 '\xa0', UTF-8 '\xc2'
'\xa0') in source code, converting them into regular spaces in the
cooked character stream when not in character literals.
Lowering was crashing when cuf kernels has an unstructured construct.
Blocks created by PFT need to be re-created inside of the operation like
it is done for OpenACC construct.
This patch updates the use_device_ptr and use_device_addr clauses to use
the mapInfoOps for lowering. This allows all the types that are handle
by the map clauses such as derived types to also be supported by the
use_device_clauses.
This is patch 1/2 in a series of patches.
Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
When an outlined function is generated for omp target region, a
corresponding DISubprogram was not being generated. This resulted in all
the debug information for the target region being dropped.
This commit adds DISubprogram for the outlined function if there is one
available for the parent function. It also updates the current debug
location so that the right scope is used for the entries in the outlined
function.
There are places in the OpenMPIRBuilder which changes insertion point but
don't update the debug location accordingly. They cause issue when debug info
is enabled. I have fixed a few that I observed to cause issue. But there may be
more and a systematic cleanup may be required.
With this change in place, I can set source line breakpoint in target
region and run to them in debugger.
ALLOCATE/DEALLOCATE statements for module allocatable variable with the
pinned attribute can be lowered to the standard runtime call and do not
need further action since these variables will have a unique descriptor
that is on the host.
Previously we tracked data sharing attributes by the symbol itself not
by the ultimate symbol. When the private clause came first, subsequent
uses of the symbol found a host-associated version instead of the
ultimate symbol and so the check didn't consider them to be the same
symbol. Always adding and checking for the ultimate symbol ensures that
we have the same behaviour no matter the order of clauses.
The modified list is only used for this multiple clause check.
Closes#78235
Descriptor for module variable with cuda attribute must be set with the
correct allocator index. This patch updates the embox operation used in
the global to carry the allocator index.
Code lowering always generates fir.if else blocks for source level if
statements, whether needed or not. Change this to only generate else
blocks that are needed.
shortloop is a non standard OpenACC extension
(https://docs.nvidia.com/hpc-sdk/pgi-compilers/2015/pgirn157.pdf) that
can be found on loop directives.
f18 parser was choking when seeing it. Since it can be found in existing
apps and is mainly an optimization hint, parse it on loop directives and
ignore it with a warning.
For the records, here is shortloop meaning according to the manual linked above:
"If the shortloop clause appears on a loop directive with the vector clause, it tells the compiler that the
loop trip count is less than or equal to the number of vector lanes created for that loop. This means the
value of the vector() clause on the loop directive in a kernels region, or the value of the
vector_length() clause on the parallel directive in a parallel region will be greater than or
equal to the loop trip count. This allows the compiler to generate more efficient code for the loop"
The current pattern was failing OpenACC semantics in acc parse tree
canonicalization:
```
!acc loop
!dir vector aligned
do i=1,n
...
```
Fix it by moving the directive before the OpenACC construct node.
Note that I think it could make sense to propagate the $dir info to the
acc.loop, at least with classic flang, the $dir seems to make a
difference. This is not done here since few directives are supported
anyway.
Allow some interaction between LLVM and FIR dialect by allowing
conversion between FIR memory types and llvm.ptr type.
This is meant to help experimentation where FIR and LLVM dialect
coexists, and is useful to deal with cases where LLVM type makes it
early into the MLIR produced by flang, like when inserting LLVM stack
intrinsic here:
0a00d32c5f/flang/lib/Optimizer/Transforms/StackReclaim.cpp (L57)
ALLOCATE and DEALLOCATE statements can be inlined in device function.
This patch updates the condition that determined to inline these actions
in lowering.
This avoid runtime calls in device function code and can speed up the
execution.
Also move `isCudaDeviceContext` from `Bridge.cpp` so it can be used
elsewhere.
The behavior deliberately mimics that of clang. Ideally, -print-pipeline-passes
should be a first-class driver option. Notes to this effect have been added in
the appropriate places in both flang and clang.
---------
Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>
`cuf.data_transfer` will be converted to runtime calls to cuda runtime
api and these are not supported in device code. assignment in OpenACC
region will be handled by the OpenACC code gen so we avoid to generate
data transfer on them.
Reverts llvm/llvm-project#102147
It seems some systems which should support F128 are wrongly detected as
not supporting.
This might be due to checking `LDBL_MANT_DIG` instead of
`__LDBL_MANT_DIG__`. I will investigate.
This patch moves the creation of `DataSharingProcessor` instances for
loop constructs out of `genOMPDispatch()` and into their corresponding
codegen functions. This is a necessary first step to enable a proper
handling of privatization on composite constructs.
Some tests are updated due to a change of order between clause
processing and privatization.
This generates `warning: REAL(KIND=16) is not an enabled type for this
target` if that type is used in a build not correctly configured to
support this type. Uses of `selected_real_kind(30)` return -1.
#106120 Simplify the data transfer when possible by using the reference
and a shape. This bypass the declare op. In order to keep the declare op
around, use the second results of the declare op which achieve the same.