mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-25 09:46:06 +00:00
443 lines
17 KiB
Markdown
443 lines
17 KiB
Markdown
![]() |
# Debug Generation
|
||
|
|
||
|
Application developers spend a significant time debugging the applications that
|
||
|
they create. Hence it is important that a compiler provide support for a good
|
||
|
debug experience. DWARF[1] is the standard debugging file format used by
|
||
|
compilers and debuggers. The LLVM infrastructure supports debug info generation
|
||
|
using metadata[2]. Support for generating debug metadata is present
|
||
|
in MLIR by way of MLIR attributes. Flang can leverage these MLIR attributes to
|
||
|
generate good debug information.
|
||
|
|
||
|
We can break the work for debug generation into two separate tasks:
|
||
|
1) Line Table generation
|
||
|
2) Full debug generation
|
||
|
The support for Fortran Debug in LLVM infrastructure[3] has made great progress
|
||
|
due to many Fortran frontends adopting LLVM as the backend as well as the
|
||
|
availability of the Classic Flang compiler.
|
||
|
|
||
|
## Driver Flags
|
||
|
By default, Flang will not generate any debug or linetable information.
|
||
|
Debug information will be generated if the following flags are present.
|
||
|
|
||
|
-gline-tables-only, -g1 : Emit debug line number tables only
|
||
|
-g : Emit full debug info
|
||
|
|
||
|
## Line Table Generation
|
||
|
|
||
|
There is existing AddDebugFoundationPass which add `FusedLoc` with a
|
||
|
`SubprogramAttr` on FuncOp. This allows MLIR to generate LLVM IR metadata
|
||
|
for that function. However, following values are hardcoded at the moment. These
|
||
|
will instead be passed from the driver.
|
||
|
|
||
|
- Details of the compiler (name and version and git hash).
|
||
|
- Language Standard. We can set it to Fortran95 for now and periodically
|
||
|
revise it when full support for later standards is available.
|
||
|
- Optimisation Level.
|
||
|
- Type of debug generated (linetable/full debug).
|
||
|
- Calling Convention: `DW_CC_normal` by default and `DW_CC_program` if it is
|
||
|
the main program.
|
||
|
|
||
|
`DISubroutineTypeAttr` currently has a fixed type. This will be changed to
|
||
|
match the signature of the actual function/subroutine.
|
||
|
|
||
|
|
||
|
## Full Debug Generation
|
||
|
|
||
|
Full debug info will include metadata to describe functions, variables and
|
||
|
types. Flang will generate debug metadata in the form of MLIR attributes. These
|
||
|
attributes will be converted to the format expected by LLVM IR in DebugTranslation[4].
|
||
|
|
||
|
Debug metadata generation can be broken down in 2 steps.
|
||
|
|
||
|
1. MLIR attributes are generated by reading information from AST or FIR. This
|
||
|
step can happen anytime before or during conversion to LLVM dialect. An example
|
||
|
of the metadata generated in this step is `DILocalVariableAttr` or
|
||
|
`DIDerivedTypeAttr`.
|
||
|
|
||
|
2. Changes that can only happen during or after conversion to LLVM dialect. The
|
||
|
example of this is passing `DIGlobalVariableExpressionAttr` while
|
||
|
creating `LLVM::GlobalOp`. Another example will be generation of `DbgDeclareOp`
|
||
|
that is required for local variables. It can only be created after conversion to
|
||
|
LLVM dialect as it requires LLVM.Ptr type. The changes required for step 2 are
|
||
|
quite minimal. The bulk of the work happens in step 1.
|
||
|
|
||
|
One design decision that we need to make is to decide where to perform step 1.
|
||
|
Here are some possible options:
|
||
|
|
||
|
**During conversion to LLVM dialect**
|
||
|
|
||
|
Pros:
|
||
|
1. Do step 1 and 2 in one place.
|
||
|
2. No chance of missing any change introduced by an earlier transformation.
|
||
|
|
||
|
Cons:
|
||
|
1. Passing a lot of information from the driver as discussed in the line table
|
||
|
section above may muddle interface of FIRToLLVMConversion.
|
||
|
2. `DeclareOp` is removed before this pass.
|
||
|
3. Even if `DeclareOp` is retained, creating debug metadata while some ops have
|
||
|
been converted to LLVMdialect and others are not may cause its own issues. We
|
||
|
have to walk the ops chain to extract the information which may be problematic
|
||
|
in this case.
|
||
|
4. Some source information is lost by this point. Examples include
|
||
|
information about namelists, source line information about field of derived
|
||
|
types etc.
|
||
|
|
||
|
**During a pass before conversion to LLVM dialect**
|
||
|
|
||
|
This is similar to what AddDebugFoundationPass is currently doing.
|
||
|
|
||
|
Pros:
|
||
|
1. One central location dedicated to debug information processing. This can
|
||
|
result in a cleaner implementation.
|
||
|
2. Similar to above, less chance of missing any change introduced by an earlier
|
||
|
transformation.
|
||
|
|
||
|
Cons:
|
||
|
1. Step 2 still need to happen during conversion to LLVM dialect. But
|
||
|
changes required for step 2 are quite minimal.
|
||
|
2. Similar to above, some source information may be lost by this point.
|
||
|
|
||
|
**During Lowering from AST**
|
||
|
|
||
|
Pros
|
||
|
1. We have better source information.
|
||
|
|
||
|
Cons:
|
||
|
1. There may be change in the code after lowering which may not be
|
||
|
reflected in debug information.
|
||
|
2. Comments on an earlier PR [5] advised against this approach.
|
||
|
|
||
|
## Design
|
||
|
|
||
|
The design below assumes that we are extracting the information from FIR.
|
||
|
If we generate debug metadata during lowering then the description below
|
||
|
may need to change. Although the generated metadata remains the same in
|
||
|
both cases.
|
||
|
|
||
|
The AddDebugFoundationPass will be renamed to AddDebugInfo Pass. The
|
||
|
information mentioned in the line info section above will be passed to it from
|
||
|
the driver. This pass will run quite late in the pipeline but before
|
||
|
`DeclareOp` is removed.
|
||
|
|
||
|
In this pass, we will iterate through the `GlobalOp`, `TypeInfoOp`, `FuncOp`
|
||
|
and `DeclareOp` to extract the source information and build the MLIR
|
||
|
attributes. A class will be added to handle conversion of MLIR and FIR types to
|
||
|
`DITypeAttr`.
|
||
|
|
||
|
Following sections provide details of how various language constructs will be
|
||
|
handled. In these sections, the LLVM IR metadata and MLIR attributes have been
|
||
|
used interchangeably. As an example, `DILocalVariableAttr` is an MLIR attribute
|
||
|
which gets translated to LLVM IR's `DILocalVariable`.
|
||
|
|
||
|
### Variables
|
||
|
|
||
|
#### Local Variables
|
||
|
In MLIR, local variables are represented by `DILocalVariableAttr` which
|
||
|
stores information like source location and type. They also require a
|
||
|
`DbgDeclareOp` which binds `DILocalVariableAttr` with a location.
|
||
|
|
||
|
In FIR, `DeclareOp` has source information about the variable. The
|
||
|
`DeclareOp` will be processed to create `DILocalVariableAttr`. This attr is
|
||
|
attached to the memref op of the `DeclareOp` using a `FusedLoc` approach.
|
||
|
|
||
|
During conversion to LLVM dialect, when an op is encountered that has a
|
||
|
`DILocalVariableAttr` in its `FusedLoc`, a `DbgDeclareOp` is created which
|
||
|
binds the attr with its location.
|
||
|
|
||
|
The change in the IR look like as follows:
|
||
|
|
||
|
```
|
||
|
original fir
|
||
|
%2 = fir.alloca i32 loc(#loc4)
|
||
|
%3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"}
|
||
|
|
||
|
Fir with FusedLoc.
|
||
|
|
||
|
%2 = fir.alloca i32 loc(#loc38)
|
||
|
%3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"}
|
||
|
#di_local_variable5 = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ... >
|
||
|
#loc38 = loc(fused<#di_local_variable5>[#loc4])
|
||
|
|
||
|
After conversion to llvm dialect
|
||
|
|
||
|
#di_local_variable = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ...>
|
||
|
%1 = llvm.alloca %0 x i64
|
||
|
llvm.intr.dbg.declare #di_local_variable = %1
|
||
|
```
|
||
|
|
||
|
#### Function Arguments
|
||
|
|
||
|
Arguments work in similar way, but they present a difficulty that `DeclareOp`'s
|
||
|
memref points to `BlockArgument`. Unlike the op in local variable case,
|
||
|
the `BlockArgument` are not handled by the FIRToLLVMLowering. This can easily
|
||
|
be handled by adding after conversion to LLVM dialect either in FIRToLLVMLowering
|
||
|
or in a separate pass.
|
||
|
|
||
|
### Module
|
||
|
|
||
|
In debug metadata, the Fortran module will be represented by `DIModuleAttr`.
|
||
|
The variables or functions inside module will have scope pointing to the parent module.
|
||
|
|
||
|
```
|
||
|
module helper
|
||
|
real glr
|
||
|
...
|
||
|
end module helper
|
||
|
|
||
|
!1 = !DICompileUnit(language: DW_LANG_Fortran90 ...)
|
||
|
!2 = !DIModule(scope: !1, name: "helper" ...)
|
||
|
!3 = !DIGlobalVariable(scope: !2, name: "glr" ...)
|
||
|
|
||
|
Use of a module results in the following metadata.
|
||
|
!4 = !DIImportedEntity(tag: DW_TAG_imported_module, entity: !2)
|
||
|
```
|
||
|
|
||
|
Modules are not first class entities in the FIR. So there is no way to get
|
||
|
the location where they are declared in source file.
|
||
|
|
||
|
But the information that a variable or function is part of a module
|
||
|
can be extracted from its mangled name along with name of the module. There is
|
||
|
a `GlobalOp` generated for each module variable in FIR and there is also a
|
||
|
`DeclareOp` in each function where the module variable is used.
|
||
|
|
||
|
We will use the `GlobalOp` to generate the `DIModuleAttr` and associated
|
||
|
`DIGlobalVariableAttr`. A `DeclareOp` for module variable will be used
|
||
|
to generate `DIImportedEntityAttr`. Care will be taken to avoid generating
|
||
|
duplicate `DIImportedEntityAttr` entries in same function.
|
||
|
|
||
|
### Derived Types
|
||
|
|
||
|
A derived type will be represented in metadata by `DICompositeType` with a tag of
|
||
|
`DW_TAG_structure_type`. It will have elements which point to the components.
|
||
|
|
||
|
```
|
||
|
type :: t_pair
|
||
|
integer :: i
|
||
|
real :: x
|
||
|
end type
|
||
|
!1 = !DICompositeType(tag: DW_TAG_structure_type, name: "t_pair", elements: !2 ...)
|
||
|
!2 = !{!3, !4}
|
||
|
!3 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "i", size: 32, offset: 0, baseType: !5 ...)
|
||
|
!4 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "x", size: 32, offset: 32, baseType: !6 ...)
|
||
|
!5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
|
||
|
!6 = !DIBasicType(tag: DW_TAG_base_type, name: "real" ...)
|
||
|
```
|
||
|
|
||
|
In FIR, `RecordType` and `TypeInfoOp` can be used to get information about the
|
||
|
location of the derived type and the types of its components. We may also use
|
||
|
`FusedLoc` on `TypeInfoOp` to encode location information for all the components
|
||
|
of the derived type.
|
||
|
|
||
|
### CommonBlocks
|
||
|
|
||
|
A common block will be represented in metadata by `DICommonBlockAttr` which
|
||
|
will be used as scope by the variable inside common block. `DIExpression`
|
||
|
can be used to give the offset of any given variable inside the global storage
|
||
|
for common block.
|
||
|
|
||
|
```
|
||
|
integer a, b
|
||
|
common /test/ a, b
|
||
|
|
||
|
;@test_ = common global [8 x i8] zeroinitializer, !dbg !5, !dbg !6
|
||
|
!1 = !DISubprogram()
|
||
|
!2 = !DICommonBlock(scope: !1, name: "test" ...)
|
||
|
!3 = !DIGlobalVariable(scope: !2, name: "a" ...)
|
||
|
!4 = !DIExpression()
|
||
|
!5 = !DIGlobalVariableExpression(var: !3, expr: !4)
|
||
|
!6 = !DIGlobalVariable(scope: !2, name: "b" ...)
|
||
|
!7 = !DIExpression(DW_OP_plus_uconst, 4)
|
||
|
!8 = !DIGlobalVariableExpression(var: !6, expr: !7)
|
||
|
```
|
||
|
|
||
|
In FIR, a common block results in a `GlobalOp` with common linkage. Every
|
||
|
function where the common block is used has `DeclareOp` for that variable.
|
||
|
This `DeclareOp` will point to global storage through
|
||
|
`CoordinateOp` and `AddrOfOp`. The `CoordinateOp` has the offset of the
|
||
|
location of this variable in global storage. There is enough information to
|
||
|
generate the required metadata. Although it requires walking up the chain from
|
||
|
`DeclaredOp` to locate `CoordinateOp` and `AddrOfOp`.
|
||
|
|
||
|
### Arrays
|
||
|
|
||
|
The type of fixed size array is represented using `DICompositeType`. The
|
||
|
`DISubrangeAttr` is used to provide bounds in any given dimensions.
|
||
|
|
||
|
```
|
||
|
integer abc(4,5)
|
||
|
|
||
|
!1 = !DICompositeType(tag: DW_TAG_array_type, baseType: !5, elements: !2 ...)
|
||
|
!2 = !{ !3, !4 }
|
||
|
!3 = !DISubrange(lowerBound: 1, upperBound: 4 ...)
|
||
|
!4 = !DISubrange(lowerBound: 1, upperBound: 5 ...)
|
||
|
!5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
|
||
|
|
||
|
```
|
||
|
|
||
|
#### Adjustable
|
||
|
|
||
|
The debug metadata for the adjustable array looks similar to fixed sized array
|
||
|
with one change. The bounds are not constant values but point to a
|
||
|
`DILocalVariableAttr`.
|
||
|
|
||
|
In FIR, the `DeclareOp` points to a `ShapeOp` and we can walk the chain
|
||
|
to get the value that represents the array bound in any dimension. We will
|
||
|
create a `DILocalVariableAttr` that will point to that location. This
|
||
|
variable will be used in the `DISubrangeAttr`. Note that this
|
||
|
`DILocalVariableAttr` does not correspond to any source variable.
|
||
|
|
||
|
#### Assumed Size
|
||
|
|
||
|
This is treated as raw array. Debug information will not provide any upper bound
|
||
|
information for the last dimension.
|
||
|
|
||
|
#### Assumed Shape
|
||
|
The assumed shape array will use the similar representation as fixed size
|
||
|
array but there will be 2 differences.
|
||
|
|
||
|
1. There will be a `datalocation` field which will be an expression. This will
|
||
|
enable debugger to get the data pointer from array descriptor.
|
||
|
|
||
|
2. The field in `DISubrangeAttr` for array bounds will be expression which will
|
||
|
allow the debugger to get the bounds from descriptor.
|
||
|
|
||
|
```
|
||
|
integer(4), intent(out) :: a(:,:)
|
||
|
|
||
|
!1 = !DICompositeType(tag: DW_TAG_array_type, baseType: !8, elements: !2, dataLocation: !3)
|
||
|
!2 = !{!5, !7}
|
||
|
!3 = !DIExpression(DW_OP_push_object_address, DW_OP_deref)
|
||
|
!4 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 32, DW_OP_deref)
|
||
|
!5 = !DISubrange(lowerBound: !1, upperBound: !4 ...)
|
||
|
!6 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 56, DW_OP_deref)
|
||
|
!7 = !DISubrange(lowerBound: !1, upperBound: !6, ...)
|
||
|
!8 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
|
||
|
```
|
||
|
|
||
|
In assumed shape case, the rank can be determined from the FIR's `SequenceType`.
|
||
|
This allows us to generate a `DISubrangeAttr` in each dimension.
|
||
|
|
||
|
#### Assumed Rank
|
||
|
|
||
|
This is currently unsupported in flang. Its representation will be similar to
|
||
|
array representation for assumed shape array with the following difference.
|
||
|
|
||
|
1. `DICompositeTypeAttr` will have a rank field which will be an expression.
|
||
|
It will be used to get the rank value from descriptor.
|
||
|
2. Instead of `DISubrangeType` for each dimension, there will be a single
|
||
|
`DIGenericSubrange` which will allow debuggers to calculate bounds in any
|
||
|
dimension.
|
||
|
|
||
|
### Pointers and Allocatables
|
||
|
The pointer and allocatable will be represented using `DICompositeTypeAttr`. It
|
||
|
is quirk of DWARF that scalar allocatable or pointer variables will show up in
|
||
|
the debug info as pointer to scalar while array pointer or allocatable
|
||
|
variables show up as arrays. The behavior is same in gfortran and classic flang.
|
||
|
|
||
|
```
|
||
|
integer, allocatable :: ar(:)
|
||
|
integer, pointer :: sc
|
||
|
|
||
|
!1 = !DILocalVariable(name: "sc", type: !2)
|
||
|
!2 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !3, associated: !9 ...)
|
||
|
!3 = !DIBasicType(tag: DW_TAG_base_type, name: "integer", ...)
|
||
|
!4 = !DILocalVariable(name: "ar", type: !5 ...)
|
||
|
!5 = !DICompositeType(tag: DW_TAG_array_type, baseType: !3, elements: !6, dataLocation: !8, allocated: !9)
|
||
|
!6 = !{!7}
|
||
|
!7 = !DISubrange(lowerBound: !10, upperBound: !11 ...)
|
||
|
!8 = !DIExpression(DW_OP_push_object_address, DW_OP_deref)
|
||
|
!9 = !DIExpression(DW_OP_push_object_address, DW_OP_deref, DW_OP_lit0, DW_OP_ne)
|
||
|
!10 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 24, DW_OP_deref)
|
||
|
!11 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 32, DW_OP_deref)
|
||
|
|
||
|
```
|
||
|
|
||
|
IN FIR, these variable are represent as <!fir.box<!fir.heap<>> or
|
||
|
fir.box<!fir.ptr<>>. There is also `allocatable` or `pointer` attribute on
|
||
|
the `DeclareOp`. This allows us to generate allocated/associated status of
|
||
|
these variables. The metadata to get the information from the descriptor is
|
||
|
similar to arrays.
|
||
|
|
||
|
### Strings
|
||
|
|
||
|
The `DIStringTypeAttr` can represent both fixed size and allocatable strings. For
|
||
|
the allocatable case, the `stringLengthExpression` and `stringLocationExpression`
|
||
|
are used to provide the length and the location of the string respectively.
|
||
|
|
||
|
```
|
||
|
character(len=:), allocatable :: var
|
||
|
character(len=20) :: fixed
|
||
|
|
||
|
!1 = !DILocalVariable(name: "var", type: !2)
|
||
|
!2 = !DIStringType(name: "character(*)", stringLengthExpression: !4, stringLocationExpression: !3 ...)
|
||
|
!3 = !DIExpression(DW_OP_push_object_address, DW_OP_deref)
|
||
|
!4 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 8)
|
||
|
|
||
|
!5 = !DILocalVariable(name: "fixed", type: !6)
|
||
|
!6 = !DIStringType(name: "character (20)", size: 160)
|
||
|
|
||
|
```
|
||
|
|
||
|
### Association
|
||
|
|
||
|
They will be treated like normal variables. Although we may require to handle
|
||
|
the case where the `DeclareOp` of one variable points to the `DeclareOp` of
|
||
|
another variable (e.g. a => b).
|
||
|
|
||
|
### Namelists
|
||
|
|
||
|
FIR does not seem to have a way to extract information about namelists.
|
||
|
|
||
|
```
|
||
|
namelist /abc/ x3, y3
|
||
|
|
||
|
(gdb) p abc
|
||
|
$1 = ( x3 = 100, y3 = 500 )
|
||
|
(gdb) p x3
|
||
|
$2 = 100
|
||
|
(gdb) p y3
|
||
|
$3 = 500
|
||
|
```
|
||
|
|
||
|
Even without namelist support, we should be able to see the value of the
|
||
|
individual variables like `x3` and `y3` in the above example. But we would not
|
||
|
be able to evaluate the namelist and have the debugger prints the value of all
|
||
|
the variables in it as shown above for `abc`.
|
||
|
|
||
|
## Missing metadata in MLIR
|
||
|
|
||
|
Some metadata types that are needed for fortran are present in LLVM IR but are
|
||
|
absent from MLIR. A non comprehensive list is given below.
|
||
|
|
||
|
1. `DICommonBlockAttr`
|
||
|
2. `DIGenericSubrangeAttr`
|
||
|
3. `DISubrangeAttr` in MLIR takes IntegerAttr at the moment so only works
|
||
|
with fixed sizes arrays. It needs to also accept `DIExpressionAttr` or
|
||
|
`DILocalVariableAttr` to support assumed shape and adjustable arrays.
|
||
|
4. The `DICompositeTypeAttr` will need to have field for `datalocation`,
|
||
|
`rank`, `allocated` and `associated`.
|
||
|
5. `DIStringTypeAttr`
|
||
|
|
||
|
# Testing
|
||
|
|
||
|
- LLVM LIT tests will be added to test:
|
||
|
- the driver and ensure that it passes the line table and full debug
|
||
|
info generation appropriately.
|
||
|
- that the pass works as expected and generates debug info. Test will be
|
||
|
with `fir-opt`.
|
||
|
- with `flang -fc1` that end-to-end debug info generation works.
|
||
|
- Manual external tests will be written to ensure that the following works
|
||
|
in debug tools
|
||
|
- Break at lines.
|
||
|
- Break at functions.
|
||
|
- print type (ptype) of function names.
|
||
|
- print values and types (ptype) of various type of variables
|
||
|
- Manually run `GDB`'s gdb.fortran testsuite with llvm-flang.
|
||
|
|
||
|
# Resources
|
||
|
- [1] https://dwarfstd.org/doc/DWARF5.pdf
|
||
|
- [2] https://llvm.org/docs/LangRef.html#metadata
|
||
|
- [3] https://archive.fosdem.org/2022/schedule/event/llvm_fortran_debug/
|
||
|
- [4] https://github.com/llvm/llvm-project/blob/main/mlir/lib/Target/LLVMIR/DebugTranslation.cpp
|
||
|
- [5] https://github.com/llvm/llvm-project/pull/84202
|