mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-26 13:16:06 +00:00

The website renders this `<h1>` as the page title in the index. This patch updates the title to better fit with the names of the other pages. See the index here https://flang.llvm.org/docs/
193 lines
9.7 KiB
Markdown
193 lines
9.7 KiB
Markdown
# Stack arrays pass
|
|
## Problem Description
|
|
In gfortran, `-fstack-arrays` will cause all local arrays, including those of
|
|
unknown size, to be allocated from stack memory. Gfortran enables this flag by
|
|
default at `-Ofast`.
|
|
|
|
Flang already allocates all local arrays on the stack (the
|
|
`memory-allocation-opt` pass can move large allocations to the heap, but by
|
|
default it will not). But there are some cases where temporary arrays are
|
|
created on the heap by Flang. Moving these temporary arrays is the goal here.
|
|
|
|
## Proposed Solution
|
|
One approach would be to write a pass which attempts to move heap allocations to
|
|
the stack (like the `memory-allocation-opt` pass in reverse). This approach has
|
|
the advantage of a self-contained implementation and ensuring that newly added
|
|
cases are covered.
|
|
|
|
Another approach would be to modify each place in which arrays are allocated on
|
|
the heap to instead generate a stack allocation. The advantage of the second
|
|
approach is that it would be difficult to match up all `fir.allocmem` operations
|
|
with their associated `fir.freemem`. In general, the lifetimes of heap allocated
|
|
memory are not constrained by the current function's stack frame and so cannot
|
|
be always converted to stack allocations. It is much easier to swap heap
|
|
allocations for stack allocations when they are first generated because the
|
|
lifetime information is conveniently available.
|
|
|
|
For example, to rewrite the heap allocation in the `array-value-copy` pass with
|
|
a stack allocation using the first approach would require analysis to ensure
|
|
that the heap allocation is always freed before the function returns. This is
|
|
much more complex than never generating a heap allocation (and free) in the
|
|
first place (the second approach).
|
|
|
|
The plan is to take the more complex first approach so that newly added changes
|
|
to lowering code do not need to be made to support the stack arrays option. The
|
|
general problem of determining heap allocation lifetimes can be simplified in
|
|
this case because only those allocations which are always freed before exit from
|
|
the same function are possible to be moved to heap allocations in that
|
|
function's stack frame. Furthermore, the aim of this pass would be to only move
|
|
those allocations generated by Flang, and so some of the more difficult cases
|
|
can be avoided. Where the transformation is not possible, the existing heap
|
|
allocation will be kept.
|
|
|
|
## Implementation details overview
|
|
In order to limit the complexity of the proposed pass, it is useful to
|
|
understand the situations in which Flang will generate heap allocations.
|
|
|
|
### Known Heap Array Allocations
|
|
Flang allocates most arrays on the stack by default, but there are a few cases
|
|
where temporary arrays are allocated on the heap:
|
|
- `flang/lib/Optimizer/Transforms/ArrayValueCopy.cpp`
|
|
- `flang/lib/Optimizer/Transforms/MemoryAllocation.cpp`
|
|
- `flang/lib/Lower/ConvertExpr.cpp`
|
|
- `flang/lib/Lower/IntrinsicCall.cpp`
|
|
- `flang/lib/Lower/ConvertVariable.cpp`
|
|
|
|
Lowering code is being updated and in the future, temporaries for expressions
|
|
will be created in the HLFIR bufferization pass in
|
|
`flang/lib/Optimizer/HLFIR/Trnasforms/BufferizeHLFIR.cpp`.
|
|
|
|
#### `ArrayValueCopy.cpp`
|
|
Memory is allocated for a temporary array in `allocateArrayTemp()`. This
|
|
temporary array is used to ensure that assignments of one array to itself
|
|
produce the required value. E.g.
|
|
|
|
```
|
|
integer, dimension(5), intent(inout) :: x
|
|
x(3,4) = x(1,2)
|
|
```
|
|
|
|
#### `MemoryAllocation.cpp`
|
|
The default options for the Memory Allocation transformation ensure that no
|
|
array allocations, no matter how large, are moved from the stack to the heap.
|
|
|
|
#### `ConvertExpr.cpp`
|
|
`ConvertExpr.cpp` allocates many array temporaries on the heap:
|
|
- Passing array arguments by value or when they need re-shaping
|
|
- Lowering elemental array expressions
|
|
- Lowering mask expressions
|
|
- Array constructors
|
|
|
|
The last two of these cases are **not** covered by the current stack arrays pass
|
|
design.
|
|
|
|
The FIR code generated for mask expressions (the WHERE construct) sets a
|
|
boolean variable to indicate whether a heap allocation was necessary. The
|
|
allocation is only freed if the variable indicates that the allocation was
|
|
performed to begin with. The proposed dataflow analysis is not intelligent
|
|
enough to statically determine that the boolean variable will always be true
|
|
when the allocation is performed. Beyond this, the control flow in the generated
|
|
FIR code passes the allocated memory through `fir.result`, resulting in a
|
|
different SSA value to be allocated and freed, causing the analysis not to
|
|
realise that the allocated memory is freed. The most convenient solution here
|
|
would be to generate less complicated FIR code, as the existing codegen has
|
|
known bugs: https://github.com/llvm/llvm-project/issues/56921,
|
|
https://github.com/llvm/llvm-project/issues/59803.
|
|
|
|
Code generated for array constructors uses `realloc()` to grow the allocated
|
|
buffer because the size of the resulting array cannot always be determined
|
|
ahead of running the constructor. This makes this temporary unsuitable
|
|
for allocation on the stack.
|
|
|
|
#### `IntrinsicCall.cpp`
|
|
The existing design is for the runtime to do the allocation and the lowering
|
|
code to insert `fir.freemem` to remove the allocation. It is not clear whether
|
|
this can be replaced by adapting lowering code to do stack allocations and to
|
|
pass these to the runtime. This would be a significant change and so is out of
|
|
scope of `-fstack-arrays`.
|
|
|
|
#### `ConvertVariable.cpp`
|
|
See `Fortran::lower::MapSymbolAttributes`:
|
|
|
|
Dummy arguments that are not declared in the current entry point require a
|
|
skeleton definition. Most such "unused" dummies will not survive into final
|
|
generated code, but some will. It is illegal to reference one at run time if it
|
|
does. There are three ways these dummies can be mapped to a value:
|
|
- a `fir::UndefOp` value: preferable but can't be used for dummies with dynamic
|
|
bounds or used to define another dummy.
|
|
- A stack slot. This has intermediate-weight cost but may not be usable for
|
|
objects with dynamic bounds.
|
|
- A heap box/descriptor is the heaviest weight option, only used for dynamic
|
|
objects.
|
|
|
|
These heap allocations will be out of scope for `-fstack-arrays` because the
|
|
existing implementation already uses stack allocations (or better) where
|
|
possible, and most such dummy arguments will be removed from generated code.
|
|
|
|
#### `BufferizeHLFIR.cpp`
|
|
TODO
|
|
|
|
### Detecting Allocations to Move
|
|
Allocations which could be moved to the stack will be detected by performing a
|
|
forward dense data flow analysis using `mlir::dataflow::DenseForwardDataFlowAnalysis`.
|
|
This analysis will search for SSA values created by a `fir.allocmem` which are
|
|
always freed using `fir.freemem` within the same function.
|
|
|
|
Tracking allocations by SSA values will miss some cases where address
|
|
calculations are duplicated in different blocks: resulting in different SSA
|
|
values as arguments for the allocation and the free. It is believed that simple
|
|
tracking of SSA values is sufficient for detecting the allocations for array
|
|
temporaries added by Flang, because these temporaries should be simple arrays:
|
|
not nested inside of derived types or other arrays.
|
|
|
|
Arrays allocated by an `allocate` statement in source code should not be moved
|
|
to the stack. These will be identified by adding an attribute to these
|
|
`fir.allocmem` operations when they are lowered. Intrinsics such as `allocated`
|
|
and `move_alloc` do not need to be supported because the array temporaries moved
|
|
to the stack will never be visible in user code.
|
|
|
|
Only allocations for arrays will be considered for moving to the stack.
|
|
|
|
When conducting the dense data flow analysis, the pass will assume that existing
|
|
allocations are not freed inside called functions. This is true for the existing
|
|
heap array temporary allocations generated by Flang. If an allocation were to be
|
|
freed inside of a called function, the allocation would presumably not also be
|
|
freed later in the caller function (as this would be a double-free). Therefore
|
|
the stack arrays pass would not find where the allocation was freed and so not
|
|
transform the allocation. It is necessary to make this assumption so that the
|
|
stack arrays pass can transform array allocations created for pass-by-value
|
|
function arguments.
|
|
|
|
### Moving Allocations
|
|
The `fir.allocmem` will be replaced by a `fir.alloca` with the same arguments.
|
|
The `fir.freemem` will be removed entirely.
|
|
|
|
Where possible, the `fir.alloca` should be placed in the function entry block
|
|
(so we can be sure it only happens once). This may not be possible if the array
|
|
has non-constant extents (causing the `fir.alloca` to have SSA values as
|
|
operands). In this case, the `fir.alloca` will be placed immediately after the
|
|
last operand becomes available.
|
|
|
|
If this location is inside a loop (either an `mlir::LoopLikeOpInterface` or a
|
|
cyclic CFG), the transformation should attempt to use the `llvm.stacksave`/
|
|
`llvm.stackrestore` intrinsics to ensure that the stack does not grow on every
|
|
loop iteration. Use of these intrinsics is considered valid when the allocation
|
|
and deallocation occur within the same block and there are no other stack
|
|
allocations between them. If this is not the case, the existing heap allocation
|
|
will be preserved.
|
|
|
|
### FIR attribute
|
|
A FIR attribute will be added to distinguish `fir.allocmem` arising from
|
|
`allocate` statements in source from `fir.allocmem` operations added by Flang.
|
|
The attribute will be called `"fir.must_be_heap"` and will have a boolean value:
|
|
`true` meaning that the stack arrays pass must not move the allocation, `false`
|
|
meaning that stack arrays may move the allocation. Not specifying the attribute
|
|
will be equivalent to setting it to `false`.
|
|
|
|
## Testing Plan
|
|
FileCheck tests will be written to check each of the above identified sources of
|
|
heap allocated array temporaries are detected and converted by the new pass.
|
|
|
|
Another test will check that `allocate` statements in source code will not be
|
|
moved to the stack.
|