Summary:
The functions internal to subroutine should have the scope set to the
parent function. This allows a user to evaluate local variables of
parent function when control is stopped in the child.
Fixes#96314
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: https://phabricator.intern.facebook.com/D60250527
(cherry picked from commit 626022bfd18f335ef62a461992a05dfed4e6d715)
Config files provide a facility to invoke the compiler with a predefined
set of options. The patch only enables these options in the flang
driver. Functionality was always there.
(cherry picked from commit 8a77961280536b680c404a49002a00b988ca45fc)
Commits c4b66bf and 7647174 add support for AArch64 trampolines. Updated
documentation to reflect the changes.
(cherry picked from commit c6e69b041a7e6d18463f6cf684b10fd46a62c496)
Functions returning C_PTR were lowered to function returning intptr (i64
on 64bit arch). This caused conflicts when these functions were defined
as returning !fir.ref<none>/llvm.ptr in other compiler generated
contexts (e.g., malloc).
Lower them to return !fir.ref<none>.
This should deal with https://github.com/llvm/llvm-project/issues/97325
and https://github.com/llvm/llvm-project/issues/98644.
(cherry picked from commit 1ead51a86c6c746a1b9948ca1ee142df223ffebd)
The `ExternalNameConversion` will add an _ at the end of the external
functions. We extract the real function name to use in the debug info.
The convention is to use the real name of function in the `name` field
and mangled name with extra _ at the end in the `linkageName` field.
Fixes#92391.
Makefile-based builds did not have proper dependencies to built the
FortranRuntime target after Flang new is available. This PR introduces a
dependency to ensure that this is the case. Relates to PR #95388.
---------
Co-authored-by: Michael Kruse <github@meinersbur.de>
Don't use `copyHostAssociateVar` for allocatable variables. It isn't
clear to me whether or not this should be addressed in
`copyHostAssociateVar` instead of inside OpenMP. I opted for OpenMP
to minimise how many things I effected. `copyHostAssociateVar` will
not update the destination variable if the destination variable
was unallocated. This is incorrect because assignment inside of the
openmp block can cause the allocation status of the variable to
change. Furthermore, `copyHostAssociateVar` seems to only copy the
variable address not other metadata like the size of the allocation.
Reallocation by assignment could cause this to change.
There is ongoing work for fir.complex but this looks unlikely to land
before the LLVM branch. Adding a TODO message gives cleaner output to
users. Currently fir.complex leads to a compiler assertion failure.
Some uses of character types lead to invalid fir.convert type
conversions, others make it far enough to hit the same assertion failure
as for fir.complex.
When passing a polymorphic actual array argument to an non polymorphic
explicit or assumed shape argument, copy-in/copy-out may be required and
should be made according to the dummy dynamic type.
The code that was creating the descriptor to drive this copy-in/out was
not handling properly the case where the dummy and actual rank do not
match (possible according to sequence association rules), it tried to
make the copy-in/out according to the dummy argument shape (which we may
not even know if the dummy is assumed-size). Fix this by using the
actual shape when creating this new descriptor with the dummy argument
dynamic type.
cg-rewrite runs regionDCE to get rid of the unused fir.shape/shift/slice
before codegen since those operations have no codegen.
I came across an issue where unreachable code would cause the pass to
fail with `error: loc(...): null operand found`.
It turns out `mlir::RegionDCE` does not work properly in presence of
unreachable code because it delete operations in reachable code that are
unused in reachable code, but still used in unreachable code (like the
constant in the added test case). It seems `mlir::RegionDCE` is always
run after `mlir::eraseUnreachableBlock` outside of this pass.
A solution could be to run `mlir::eraseUnreachableBlock` here or to try
modifying `mlir::RegionDCE`. But the current behavior may be
intentional, and both of these calls are actually quite expensive. For
instance, RegionDCE will does liveness analysis, and removes unused
block arguments, which is way more than what is needed here. I am not
very found of having this rather heavy transformation inside this pass
(they should be run after or before if they matter in the overall
pipeline).
Do a naïve backward deletion of the trivially dead operations instead.
It is cheaper, and works with unreachable code.
This fixes a bug in OpenMP privatisation. The privatised variables are
created as though they are host associated clones of the original
variables. These privatised variables do not contain the allocatable
attribute themselves and so we need to check if the ultimate symbol is
allocatable. Having or not having this flag influences whether lowering
determines that this is a whole allocatable assignment, which then
causes hlfir.assign not to get the realloc flag, which cases the
allocatable not to be allocated when it is assigned to (leading to a
segfault running the newly added test).
I also did the same for pointer variables because I would imagine they
could experience the same issue.
There is no fallout on tests outside of OpenMP, and the gfortran test
suite still passes, so I think this doesn't break host other kinds of
host associated symbols.
Moves definitions of the kind arrays into a Fortran MODULE to not only
emit the MOD file, but also compile that MODULE file into an object
file. This file is then linked into libFortranRuntime.so.
Eventually this workaround PR shoud be redone and a proper runtime build
should be setup that will then also compile Fortran MODULE files.
Fixes#89403
---------
Co-authored-by: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>
Co-authored-by: Michael Kruse <github@meinersbur.de>
This patch enables the lastprivate clause to be used in the presence of
the collapse clause.
Note: the way we currently implement lastprivate means that this adds a
large number of compare instructions to the end of every iteration of
the loop. This is a clearly non-optimal thing to do, but lastprivate in
general will need re-implementing to prevent this. This is planned as
part of the delayed privatization work. This current implementation is
just a stop-gap measure as generating sub-optimal but working code is
better than crashing out.
When a DEC legacy STRUCTURE definition appears within another, its
STRUCTURE statement must also declare some components of the enclosing
structure.
Fixes https://github.com/llvm/llvm-project/issues/99288.
Fix what seems to be a regression in semantics in definability checking:
the construct entities of ASSOCIATE and SELECT TYPE constructs are never
pointers or allocatables, even when their selectors are so. SELECT RANK
construct entities, however, can be pointers or allocatables.
The Fortran standard committees passed an "interp" request at their June
2024 meetings that is distinct from nearly every other Fortran compiler
that I tried (6) in an an ambiguous case (parent component naming when
the base type has been renamed via USE association). Document this case
in flang/docs/Extensions.md as an intentional instance of
non-conformance chosen for portability and better usability.
Derived type assignment checking needs to account for the possibility of
derived assignment. The implementation was checking compile-time
conformance errors only on the path for assignments of intrinsic types.
Add a static array conformance check in the derived type flow once it
has been established that no defined assignment exists.
Fixes https://github.com/llvm/llvm-project/issues/98981.
ExternalFileUnit::FinishReadingRecord() is called at the end of a READ
statement, unless the read is non-advancing and there was no error. In
the event of an erroneous non-advancing read, however,
FinishReadingRecord() leaves ConnectionState::leftTabLimit inhabited by
a value, leaving the impression that the *next* record is undergoing a
non-advancing operation, and this cause the next operation to fail. In
the test case in the reported bug, the next operation is a BACKSPACE,
and it ends up doing nothing. The fix is to always reset leftTabLimit in
FinishReadingRecord.
Fixes https://github.com/llvm/llvm-project/issues/98783.
The prescanner checks lines that begin with a keyword macro name to see
whether they should be treated as a comment or compiler directive
instead of a source line. This fails when the potential keyword macro
name is extended with identifier characters via Fortran line
continuation. Disable line continuation during this check.
The RENAME implementation in the Fortran runtime had a few glitches that
had to be addressed:
- Wrong usage of RTDECL (fixed)
- Issue fatal error when trying to use RENAME on a target device (fixed)
Until genSecond, all intrinsic `genXXX` returning scalar intrinsic
(except NULL) were returning them as value.
The code calling genIntrinsicCall is using that assumption when
generation the asExprOp because hflir.expr<> of scalar are badly
supported in tools (I should likely just forbid them all together), the
type is meant for "non trivial" values: arrays, character, and derived
type. For instance, the added tests crashed with error: `'arith.subf' op
operand #0 must be floating-point-like, but got '!hlfir.expr<f32>'`
Load the result in genSecond and add an assert after genIntrinsicCall to
better enforce this.
This implementation relies on arithmetic conversion, let's see what
happens when we do
std::size_t length{std::strlen(string)};
if (length <= std::numeric_limits<std::int64_t>::max())
return static_cast<std::int64_t>(length);
1) if size_t == uint32_t (or lower), then the comparison operator
invokes integral promotion to uint64_t, the comparison happens, it's
fine.
2) if size_t == uint64_t, then the comparison is done between unsigned
types, which implies a conversion of
std::numeric_limits<std::int64_t>::max() to uint64_t, which happens
without accuracy loss, fine
3) if size_t == uint128_t (or higher), then we invoke integral promotion
of std::int64_t, it's also fine.
So this snippet has the same behavior as the existing one, while being
easier to read.
Extends the stack-arrays pass to support `fir.declare` ops. Before that,
we did not recognize malloc-free pairs for which `fir.declare` is used
to declare the allocated entity. This is because the `free` op was
invoked on the result of the `fir.declare` op and did not directly use
the allocated memory SSA value.
This also extends the pass to collect the analysis results within OpenMP
regions.
This patch allows the -rtlib flag with flang-new to select between the
libgcc_s and compiler-rt runtimes. The behaviour is identical to the
same flag with clang.
Local descriptor for cuda allocatable need to be handled on host and
device. One solution is to duplicate the descriptor (one on the host and
one on the device) and keep them in sync or have the descriptor in
managed/unified memory so we don't to take care of any sync.
The second solution is probably the one we will implement. In order to
have more flexibility on how descriptor representing cuda allocatable
are allocated, this patch updates the lowering to use the cuf operations
alloc and free to managed them.
This PR fixes 2 similar issues.
1. As reported in #97476, flang generated executable has inconsistent
behavior regarding values of the local array variables.
2. Variable with save attribute would not show up in debugger.
The reason behind is same for both cases. If a local variable has
storage which extends beyond function lifetime, the way to represent it
in the debug info is through a global variable whose scope is limited to
the function. This is what is used for static local variable in C.
Previously local array worked in cases they were on stack. But will not
show up if they had a global storage.
To fix this, if we can get a corresponding `GlobalOp` for a variable
while processing `DeclareOp`, we treat it the variable as global with
scope set appropriately. A new FIR test is added. A previous Integration
test has been adjusted as to not expect local variables for local
arrays.
With this fix in place, all the issues described in #97476 go away. It
also fixes a lot of fails in GDB's fortran testsuite.
Fixes#97476.
The native architecture is AArch64 here so the pentium name won't
work even if you've got the x86 backend enabled.
https://lab.llvm.org/buildbot/#/builders/17/builds/898
Pass an explicit target for each run line to fix this.
Test added in f1d3fe7aae7867b5de96b84d6d26b5c9f02f209a / #98517
This patch generalizes the MemoryAllocation pass (alloca -> heap) to
handle fir.alloca regardless of their postion in the IR. Currently, it
only dealt with fir.alloca in function entry blocks. The logic is placed
in a utility that can be used to replace alloca in an operation on
demand to whatever kind of allocation the utility user wants via
callbacks (allocmem, or custom runtime calls to instrument the code...).
To do so, a concept of ownership, that was already implied a bit and
used in passes like stack-reclaim, is formalized. Any operation with the
LoopLikeInterface, AutomaticAllocationScope, or IsolatedFromAbove owns
the alloca directly nested inside its regions, and they must not be used
after the operation.
The pass then looks for the exit points of region with such interface,
and use that to insert deallocation. If dominance is not proved, the
pass fallbacks to storing the new address into a C pointer variable
created in the entry of the owning region which allows inserting
deallocation as needed, included near the alloca itself to avoid leaks
when the alloca is executed multiple times due to block CFGs loops.
This should fix https://github.com/llvm/llvm-project/issues/88344.
In a next step, I will try to refactor lowering a bit to introduce
lifetime operation for alloca so that the deallocation points can be
inserted as soon as possible.
Extends the OmpCycleAndExitChecker to check that associated loops of a
loop construct are not DO WHILE or DO without control.
OpenMP 5.0 standard clearly mentions this restriction. Later standards
enforce this through the definition of associated loops and canonical
loop forms.
https://www.openmp.org/spec-html/5.0/openmpsu41.htmlFixes#81949
The SECOND intrinsic is a gnu extension providing an alias for CPU_TIME:
https://gcc.gnu.org/onlinedocs/gfortran/SECOND.html
This cannot be implemented as a straightforward alias because there is
both a function and a subroutine form.
This change is in preparation of #97903, which adds extra checks for
materializations: it is now enforced that they produce an SSA value of
the correct type, so the current workaround no longer works.
The original workaround avoided target materializations by directly
returning the to-be-converted SSA value from the materialization
callback. This can be avoided by initializing the lowering patterns that
insert the materializations without a type converter. For
`cg::XEmboxOp`, the existing workaround that skips
`unrealized_conversion_cast` ops is still in place.
Also remove the lowering pattern for `unrealized_conversion_cast`. This
pattern has no effect because `unrealized_conversion_cast` ops that are
inserted by the dialect conversion framework are never matched by the
pattern driver.
Pull request https://github.com/llvm/llvm-project/pull/97337 was
reverted by https://github.com/llvm/llvm-project/pull/98612 due
to two failing tests in llvm-test-suite -- which I ran, as always,
but must have bungled or misinterpreted (mea culpa).
The failing tests were llvm-test-suite/Fortran/gfortran/regression/
char_length_{20,21}.f90. They have array constructors with
explicit character types whose dynamic length values are negative
at runtime, which must be interpreted as zero.
This patch extends the original to cover those cases.
PR adds changes to the flang frontend to create the `MaskedOp` when
`masked` directive is used in the input program. Omp masked is
introduced in 5.2 standard and allows a parallel region to be executed
by threads specified by a programmer. This is achieved with the help of
filter clause which helps to specify thread id expected to execute the
region.
Other related PRs:
- [Fortran Parsing and Semantic
Support](https://github.com/llvm/llvm-project/pull/91432) - Merged
- [MLIR Support](https://github.com/llvm/llvm-project/pull/96022/files)
- Merged
- [Lowering Support](https://github.com/llvm/llvm-project/pull/98401) -
Under Review
When the `add_flang_library` was first added, it was apparently copied
over from `add_clang_library`, including its logic to determine the
library type. It includes a workaround: If `BUILD_SHARED_LIBS` is
enabled, it should build all libraries as shared, including those that
are explicitly marked as `STATIC`[^1], because `add_clang_library`
always passes at least one of `STATIC`/`SHARED` to `llvm_add_library`,
and `llvm_add_library` could not distinguish the two cases.
Then, the two implementations diverged. For its runtime libraries, Flang
requires some libraries to always be static libraries, so if a library
is explicitly marked as `STATIC`, `BUILD_SHARED_LIBS` is ignored[^2].
I noticed the two implementations of the same functionality, modified
only the `add_clang_library`, and copied over the result to
`add_flang_library`[^3], without noticing that they are slightly
different. As a result, Flang runtime libraries would be built as shared
libraries with `-DBUILD_SHARED_LIBS=ON`, which may break some build
configurations[^4].
This PR fixes the problem and at the same time simplifies the library
type algorithm by just passing SHARED/STATIC verbatim to
`llvm_add_library`. This is effectively what [^2] should have done
instead adding more code to undo the workaround of [^1]. Ideally, one
would use
```
llvm_add_library(${name} ${ARG_STATIC} ${ARG_SHARED} [...])
```
but `ARG_STATIC`/`ARG_SHARED` as set by `cmake_parse_arguments` contain
`TRUE`/`FALSE` instead of the keywords themselves. I could imagine a
utility function akin to `pythonize_bool` that does this.
This simplification adds two more changes:
1. Object libraries are not explicitly requested anymore.
`llvm_add_library` itself should determine whether an object library is
necessary. As the comment notes, using an object library is not without
problems and seem of no use here since it works fine without object
library when in `XCODE`/`MSVC_IDE` mode.
2. The property `CLANG_STATIC_LIBS` was removed. It was
`FLANG_STATIC_LIBS` before to copy&paste error of #93519 [^3] which not
used anywhere. In clang, `CLANG_STATIC_LIBS` is used for `clang-shlib`
to include all component libraries in a single large library. There is
no equivalent `flang-shlib`.
[^1]: dbc2a12c7311ff4cc2cd7887d128b506bd35b579
[^2]: 3d2e05d542e646891745c5278a09950d3c4fb4a5
[^3]: #93519
[^4]:
https://github.com/llvm/llvm-project/pull/93519#issuecomment-2192359002
Reverts llvm/llvm-project#97337
This has caused llvm test suite failures on our bots, for example:
https://lab.llvm.org/buildbot/#/builders/17/builds/709
```
FAIL: test-suite::gfortran-regression-execute-regression__char_length_21_f90.test
FAIL: test-suite::gfortran-regression-execute-regression__char_length_20_f90.test
```
The tricky bit here is that we need to generate the reduction symbol
mapping inside each of the nested SECTION constructs. This is a bit
similar to omp.canonical_loop inside of omp.wsloop, except the SECTION
constructs come from the PFT.
To make this work I moved the lowering of the SECTION constructs inside
of the lowering SECTIONS (where reduction information is still
available). This subverts the normal control flow for OpenMP lowering a
bit.
One alternative option I investigated would be to generate the SECTION
CONSTRUCTS as normal as though there were no reduction, and then to fix
them up after control returns back to genSectionsOp. The problem here is
that the code generated for the section body has the wrong symbol
mapping for the reduction variable, so all of the nested code has to be
patched up. In my prototype version this was even more hacky than what
the solution I settled upon.
Module files emitted by this Fortran compiler are valid Fortran source
files. Symbols that are USE-associated into modules are represented in
their module files with USE statements and special comments with hash
codes in them to ensure that those USE statements resolve to the same
modules that were used to build the module when its module file was
generated.
This scheme prevents unchecked module file growth in large applications
by not emitting USE-associated symbols redundantly. This problem can be
especially bad when derived type definitions must be repeated in the
module files of their clients, and the clients of those modules, and so
on. However, this scheme has the disadvantage that clients of modules
must be compiled with dependent modules in the module search path.
This new -fhermetic-module-files option causes module file output to be
free of dependences on any non-intrinsic module files; dependent modules
are instead emitted as part of the module file, rather than being
USE-associated. It is intended for top level library module files that
are shipped with binary libraries when it is not convenient to collect
and ship their dependent module files as well.
Fixes https://github.com/llvm/llvm-project/issues/97398.
Data designators like "a(j:k)" are parsed into array section references,
but once rank and type information is in hand, some of them turn out to
actually be substring references. The code that recognizes these cases
was suffering from a "false positive" in the case of a construct entity
in a SELECT RANK construct due to the use of a predicate member function
(Symbol::IsObjectArray) that only works on ObjectEntityDetails symbols.
Fix the test to use the more general Symbol::Rank() member function.