mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-30 00:26:06 +00:00
356 lines
15 KiB
Markdown
356 lines
15 KiB
Markdown
![]() |
<!--===- docs/F202X.md
|
||
|
|
||
|
Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
||
|
See https://llvm.org/LICENSE.txt for license information.
|
||
|
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
||
|
|
||
|
-->
|
||
|
|
||
|
# A first take on Fortran 202X features for LLVM Flang
|
||
|
|
||
|
I (Peter Klausler) have been studying the draft PDF of the
|
||
|
[Fortran 202X standard](https://j3-fortran.org/doc/year/23/23-007r1.pdf),
|
||
|
which will soon be published as ISO Fortran 2023.
|
||
|
I have compiled this summary of its changes relative to
|
||
|
the current Fortran 2018 standard from the perspective
|
||
|
of a [Fortran compiler](https://github.com/llvm/llvm-project/tree/main/flang)
|
||
|
implementor.
|
||
|
|
||
|
## TL;DR
|
||
|
|
||
|
Fortran 202X doesn't make very many changes to the language
|
||
|
relative to Fortran 2018, which was itself a small increment
|
||
|
over Fortran 2008.
|
||
|
Apart from `REDUCE` clauses that were added to the
|
||
|
[still broken](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md)
|
||
|
`DO CONCURRENT` construct, there's little here for Fortran users
|
||
|
to get excited about.
|
||
|
|
||
|
## Priority of implementation in LLVM Flang
|
||
|
|
||
|
We are working hard to ensure that existing working applications will
|
||
|
port successfully to LLVM Flang with minimal effort.
|
||
|
I am not particularly concerned with conforming to a new
|
||
|
standard as an end in itself.
|
||
|
|
||
|
The only features below that appear to have already been implemented
|
||
|
in other compilers are the `REDUCE` clauses and the degree trigonometric
|
||
|
intrinsic functions, so those should have priority as an aid to
|
||
|
portability.
|
||
|
We would want to support them earlier even if they were not in a standard.
|
||
|
|
||
|
The `REDUCE` clause also merits early implementation due to
|
||
|
its potential for performance improvements in real codes.
|
||
|
I don't see any other feature here that would be relevant to
|
||
|
performance (maybe a weak argument could be made for `SIMPLE`).
|
||
|
The bulk of this revision unfortunately comprises changes to Fortran that
|
||
|
are neither performance-related, already available in
|
||
|
some compilers, nor (obviously) in use in existing codes.
|
||
|
I will not prioritize implementing them myself over
|
||
|
other work until they become portability concerns or are
|
||
|
requested by actual users.
|
||
|
|
||
|
Given Fortran's history of the latency between new
|
||
|
standards and the support for their features in real compilers,
|
||
|
and then the extra lag before the features are then actually used
|
||
|
in codes meant to be portable, I doubt that many of the items
|
||
|
below will have to be worked on any time soon due to user demand.
|
||
|
|
||
|
If J3 had chosen to add more features that were material improvements
|
||
|
to Fortran -- and there's quite a long list of worthy candidates that
|
||
|
were passed over, like read-only pointers -- it would have made sense
|
||
|
for me to prioritize their implementation in LLVM Flang more
|
||
|
urgently.
|
||
|
|
||
|
## Specific change descriptions
|
||
|
|
||
|
The individual features added to the language are summarized
|
||
|
in what I see as their order of significance to Fortran users.
|
||
|
|
||
|
### Alert: There's a breaking change!
|
||
|
|
||
|
The Fortran committee used to abhor making breaking changes,
|
||
|
apart from fixes, so that conforming codes could be portable across
|
||
|
time as well as across compilers.
|
||
|
Fortran 202X, however, uncharacteristically perpetrates one such
|
||
|
change to existing semantics that will silently cause existing
|
||
|
codes to work differently, if that change were to be implemented
|
||
|
and enabled by default.
|
||
|
|
||
|
Specifically, automatic reallocation of whole deferred-length character
|
||
|
allocatable scalars is now mandated when they appear for internal output
|
||
|
(e.g., `WRITE(A,*) ...`)
|
||
|
or as output arguments for some statements and intrinsic procedures
|
||
|
(e.g., `IOMSG=`, `ERRMSG=`).
|
||
|
So existing codes that allocate output buffers
|
||
|
for such things will, or would, now observe that their buffers are
|
||
|
silently changing their lengths during execution, rather than being
|
||
|
padded with blanks or being truncated. For example:
|
||
|
|
||
|
```
|
||
|
character(:), allocatable :: buffer
|
||
|
allocate(character(20)::buffer)
|
||
|
write(buffer,'F5.3') 3.14159
|
||
|
print *, len(buffer)
|
||
|
```
|
||
|
|
||
|
prints 20 with Fortran 2018 but would print 5 with Fortran 202X.
|
||
|
|
||
|
There would have no problem with the new standard changing the
|
||
|
behavior in the current error case of an unallocated variable;
|
||
|
defining new semantics for old errors is a generally safe means
|
||
|
for extending a programming language.
|
||
|
However, in this case, we'll need to protect existing conforming
|
||
|
codes from the surprising new reallocation semantics, which
|
||
|
affect cases that are not errors.
|
||
|
|
||
|
When/if there are requests from real users to implement this breaking
|
||
|
change, and if it is implemented, I'll have to ensure that users
|
||
|
have the ability to control this change in behavior via an option &/or the
|
||
|
runtime environment, and when it's enabled, emit a warning at code
|
||
|
sites that are at risk.
|
||
|
This warning should mention a source change they can make to protect
|
||
|
themselves from this change by passing the complete substring (`A(:)`)
|
||
|
instead of a whole character allocatable.
|
||
|
|
||
|
This feature reminds me of Fortran 2003's change to whole
|
||
|
allocatable array assignment, although in that case users were
|
||
|
put at risk only of extra runtime overhead that was needless in
|
||
|
existing codes, not a change in behavior, and users learned to
|
||
|
assign to whole array sections (`A(:)=...`) rather than to whole
|
||
|
allocatable arrays where the performance hit mattered.
|
||
|
|
||
|
### Major Items
|
||
|
|
||
|
The features in this section are expensive to implement in
|
||
|
terms of engineering time to design, code, refactor, and test
|
||
|
(i.e., weeks or months, not days).
|
||
|
|
||
|
#### `DO CONCURRENT REDUCE`
|
||
|
|
||
|
J3 continues to ignore the
|
||
|
[serious semantic problems](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md)
|
||
|
with `DO CONCURRENT`, despite the simplicity of the necessary fix and their
|
||
|
admirable willingness to repair the standard to fix problems with
|
||
|
other features (e.g., plugging holes in `PURE` procedure requirements)
|
||
|
and their less admirable willingness to make breaking changes (see above).
|
||
|
They did add `REDUCE` clauses to `DO CONCURRENT`, and those seem to be
|
||
|
immediately useful to HPC codes and worth implementing soon.
|
||
|
|
||
|
#### `SIMPLE` procedures
|
||
|
|
||
|
The new `SIMPLE` procedures constitute a subset of F'95/HPF's `PURE`
|
||
|
procedures.
|
||
|
There are things that one can do in a `PURE` procedure
|
||
|
but cannot in a `SIMPLE` one. But the virtue of being `SIMPLE` seems
|
||
|
to be its own reward, not a requirement to access any other
|
||
|
feature.
|
||
|
|
||
|
`SIMPLE` procedures might have been more useful had `DO CONCURRENT` been
|
||
|
changed to require callees to be `SIMPLE`, not just `PURE`.
|
||
|
|
||
|
The implementation of `SIMPLE` will be nontrivial: it involves
|
||
|
some parsing and symbol table work, and some generalization of the
|
||
|
predicate function `IsPureProcedure()`, extending the semantic checking on
|
||
|
calls in `PURE` procedures to ensure that `SIMPLE` procedures
|
||
|
only call other `SIMPLE` procedures, and modifying the intrinsic
|
||
|
procedure table to note that most intrinsics are now `SIMPLE`
|
||
|
rather than just `PURE`.
|
||
|
|
||
|
I don't expect any codes to rush to change their `PURE` procedures
|
||
|
to be `SIMPLE`, since it buys little and reduces portability.
|
||
|
This makes `SIMPLE` a lower-priority feature.
|
||
|
|
||
|
#### Conditional expressions and actual arguments
|
||
|
|
||
|
Next on the list of "big ticket" items are C-style conditional
|
||
|
expressions. These come in two forms, each of which is a distinct
|
||
|
feature that would be nontrivial to implement, and I would not be
|
||
|
surprised to see some compilers implement one before the other.
|
||
|
|
||
|
The first form is a new parenthesized expression primary that any C programmer
|
||
|
would recognize. It has straightforward parsing and semantics,
|
||
|
but will require support in folding and all other code that
|
||
|
processes expressions. Lowering will be nontrivial due to
|
||
|
control flow.
|
||
|
|
||
|
The second form is a conditional actual argument syntax
|
||
|
that allows runtime selection of argument associations, as well
|
||
|
as a `.NIL.` syntax for optional arguments to signify an absent actual
|
||
|
argument. This would have been more useful if it had also been
|
||
|
allowed as a pointer assignment statement right-hand side, and
|
||
|
that might be a worthwhile extension. As this form is essentially
|
||
|
a conditional variable reference it may be cleaner to have a
|
||
|
distinct representation from the conditional expression primary
|
||
|
in the parse tree and strongly-typed `Expr<T>` representations.
|
||
|
|
||
|
#### `ENUMERATION TYPE`
|
||
|
|
||
|
Fortran 202X has a new category of type. The new non-interoperable
|
||
|
`ENUMERATION TYPE` feature is like C++'s `enum class` -- not, unfortunately,
|
||
|
a powerful sum data type as in Haskell or Rust. Unlike the
|
||
|
current `ENUM, BIND(C)` feature, `ENUMERATION TYPE` defines a new
|
||
|
type name and its distinct values.
|
||
|
|
||
|
This feature may well be the item requiring the largest patch to
|
||
|
the compiler for its implementation, as it affects parsing,
|
||
|
type checking on assignment and argument association, generic
|
||
|
resolution, formatted I/O, NAMELIST, debugging symbols, &c.
|
||
|
It will indirectly affect every switch statement in the compiler
|
||
|
that switches over the six (now seven) type categories.
|
||
|
This will be a big project for little useful return to users.
|
||
|
|
||
|
#### `TYPEOF` and `CLASSOF`
|
||
|
|
||
|
Last on the list of "big ticket" items are the new TYPEOF and CLASSOF
|
||
|
type specifiers, which allow declarations to indirectly use the
|
||
|
types of previously-defined entities. These would have obvious utility
|
||
|
in a language with type polymorphism but aren't going to be very
|
||
|
useful yet in Fortran 202X (esp. `TYPEOF`), although they would be worth
|
||
|
supporting as a utility feature for a parametric module extension.
|
||
|
|
||
|
`CLASSOF` has implications for semantics and lowering that need to
|
||
|
be thought through as it seems to provide a means of
|
||
|
declaring polymorphic local variables and function results that are
|
||
|
neither allocatables nor pointers.
|
||
|
|
||
|
#### Coarray extensions:
|
||
|
|
||
|
* `NOTIFY_TYPE`, `NOTIFY WAIT` statement, `NOTIFY=` specifier on image selector
|
||
|
* Arrays with coarray components
|
||
|
|
||
|
#### "Rank Independent" Features
|
||
|
|
||
|
The `RANK(n)` attribute declaration syntax is equivalent to
|
||
|
`DIMENSION(:,:,...,:)` or an equivalent entity-decl containing `n` colons.
|
||
|
As `n` must be a constant expression, that's straightforward to implement,
|
||
|
though not terribly useful until the language acquires additional features.
|
||
|
(I can see some utility in being able to declare PDT components with a
|
||
|
`RANK` that depends on a `KIND` type parameter.)
|
||
|
|
||
|
It is now possible to declare the lower and upper bounds of an explicit
|
||
|
shape entity using a constant-length vector specification expression
|
||
|
in a declaration, `ALLOCATE` statement, or pointer assignment with
|
||
|
bounds remapping.
|
||
|
For example, `real A([2,3])` is equivalent to `real A(2,3)`.
|
||
|
|
||
|
The new `A(@V)` "multiple subscript" indexing syntax uses an integer
|
||
|
vector to supply a list of subscripts or of triplet bounds/strides. This one
|
||
|
has tough edge cases for lowering that need to be thought through;
|
||
|
for example, when the lengths of two or more of the vectors in
|
||
|
`A(@U,@V,@W)` are not known at compilation time, implementing the indexing
|
||
|
would be tricky in generated code and might just end up filling a
|
||
|
temporary with `[U,V,W]` first.
|
||
|
|
||
|
The obvious use case for "multiple subscripts" would be as a means to
|
||
|
index into an assumed-rank dummy argument without the bother of a `SELECT RANK`
|
||
|
construct, but that usage is not supported in Fortran 202X.
|
||
|
|
||
|
This feature may well turn out to be Fortran 202X's analog to Fortran 2003's
|
||
|
`LEN` derived type parameters.
|
||
|
|
||
|
### Minor Items
|
||
|
|
||
|
So much for the major features of Fortran 202X. The longer list
|
||
|
of minor features can be more briefly summarized.
|
||
|
|
||
|
#### New Edit Descriptors
|
||
|
|
||
|
Fortran 202X has some noncontroversial small tweaks to formatted output.
|
||
|
The `AT` edit descriptor automatically trims character output. The `LZP`,
|
||
|
`LZS`, and `LZ` control edit descriptors and `LEADING_ZERO=` specifier provide a
|
||
|
means for controlling the output of leading zero digits.
|
||
|
|
||
|
#### Intrinsic Module Extensions
|
||
|
|
||
|
Addressing some issues and omissions in intrinsic modules:
|
||
|
|
||
|
* LOGICAL8/16/32/64 and REAL16
|
||
|
* IEEE module facilities upgraded to match latest IEEE FP standard
|
||
|
* C_F_STRPOINTER, F_C_STRING for NUL-terminated strings
|
||
|
* C_F_POINTER(LOWER=)
|
||
|
|
||
|
#### Intrinsic Procedure Extensions
|
||
|
|
||
|
The `SYSTEM_CLOCK` intrinsic function got some semantic tweaks.
|
||
|
|
||
|
There are new intrinsic functions for trigonometric functions in
|
||
|
units of degrees and half-circles.
|
||
|
GNU Fortran already supports the forms that use degree units.
|
||
|
These should call into math library implementations that are
|
||
|
specialized for those units rather than simply multiplying
|
||
|
arguments or results with conversion factors.
|
||
|
* `ACOSD`, `ASIND`, `ATAND`, `ATAN2D`, `COSD`, `SIND`, `TAND`
|
||
|
* `ACOSPI`, `ASINPI`, `ATANPI`, `ATAN2PI`, `COSPI`, `SINPI`, `TANPI`
|
||
|
|
||
|
`SELECTED_LOGICAL_KIND` maps a bit size to a kind of `LOGICAL`
|
||
|
|
||
|
There are two new character utility intrinsic
|
||
|
functions whose implementations have very low priority: `SPLIT` and `TOKENIZE`.
|
||
|
`TOKENIZE` requires memory allocation to return its results,
|
||
|
and could and should have been implemented once in some Fortran utility
|
||
|
library for those who need a slow tokenization facility rather than
|
||
|
requiring implementations in each vendor's runtime support library with
|
||
|
all the extra cost and compatibilty risk that entails.
|
||
|
|
||
|
`SPLIT` is worse -- not only could it, like `TOKENIZE`,
|
||
|
have been supplied by a Fortran utility library rather than being
|
||
|
added to the standard, it's redundant;
|
||
|
it provides nothing that cannot be already accomplished by
|
||
|
composing today's `SCAN` intrinsic function with substring indexing:
|
||
|
|
||
|
```
|
||
|
module m
|
||
|
interface split
|
||
|
module procedure :: split
|
||
|
end interface
|
||
|
!instantiate for all possible ck/ik/lk combinations
|
||
|
integer, parameter :: ck = kind(''), ik = kind(0), lk = kind(.true.)
|
||
|
contains
|
||
|
simple elemental subroutine split(string, set, pos, back)
|
||
|
character(*, kind=ck), intent(in) :: string, set
|
||
|
integer(kind=ik), intent(in out) :: pos
|
||
|
logical(kind=lk), intent(in), optional :: back
|
||
|
if (present(back)) then
|
||
|
if (back) then
|
||
|
pos = scan(string(:pos-1), set, .true.)
|
||
|
return
|
||
|
end if
|
||
|
end if
|
||
|
npos = scan(string(pos+1:), set)
|
||
|
pos = merge(pos + npos, len(string) + 1, npos /= 0)
|
||
|
end
|
||
|
end
|
||
|
```
|
||
|
|
||
|
(The code above isn't a proposed implementation for `SPLIT`, just a
|
||
|
demonstration of how programs could use `SCAN` to accomplish the same
|
||
|
results today.)
|
||
|
|
||
|
## Source limitations
|
||
|
|
||
|
Fortran 202X raises the maximum number of characters per free form
|
||
|
source line and the maximum total number of characters per statement.
|
||
|
Both of these have always been unlimited in this compiler (or
|
||
|
limited only by available memory, to be more accurate.)
|
||
|
|
||
|
## More BOZ usage opportunities
|
||
|
|
||
|
BOZ literal constants (binary, octal, and hexadecimal constants,
|
||
|
also known as "typeless" values) have more conforming usage in the
|
||
|
new standard in contexts where the type is unambiguously known.
|
||
|
They may now appear as initializers, as right-hand sides of intrinsic
|
||
|
assignments to integer and real variables, in explicitly typed
|
||
|
array constructors, and in the definitions of enumerations.
|
||
|
|
||
|
## Citation updates
|
||
|
|
||
|
The source base contains hundreds of references to the subclauses,
|
||
|
requirements, and constraints of the Fortran 2018 standard, mostly in code comments.
|
||
|
These will need to be mapped to their Fortran 202X counterparts once the
|
||
|
new standard is published, as the Fortran committee does not provide a
|
||
|
means for citing these items by names that are fixed over time like the
|
||
|
C++ committee does.
|
||
|
If we had access to the LaTeX sources of the standard, we could generate
|
||
|
a mapping table and automate this update.
|