2022-11-02 13:42:52 +00:00
|
|
|
|
# Debug Info Assignment Tracking
|
|
|
|
|
|
|
|
|
|
Assignment Tracking is an alternative technique for tracking variable location
|
|
|
|
|
debug info through optimisations in LLVM. It provides accurate variable
|
|
|
|
|
locations for assignments where a local variable (or a field of one) is the
|
|
|
|
|
LHS. In rare and complicated circumstances indirect assignments might be
|
|
|
|
|
optimized away without being tracked, but otherwise we make our best effort to
|
|
|
|
|
track all variable locations.
|
|
|
|
|
|
|
|
|
|
The core idea is to track more information about source assignments in order
|
|
|
|
|
and preserve enough information to be able to defer decisions about whether to
|
|
|
|
|
use non-memory locations (register, constant) or memory locations until after
|
|
|
|
|
middle end optimisations have run. This is in opposition to using
|
|
|
|
|
`llvm.dbg.declare` and `llvm.dbg.value`, which is to make the decision for most
|
|
|
|
|
variables early on, which can result in suboptimal variable locations that may
|
|
|
|
|
be either incorrect or incomplete.
|
|
|
|
|
|
|
|
|
|
A secondary goal of assignment tracking is to cause minimal additional work for
|
|
|
|
|
LLVM pass writers, and minimal disruption to LLVM in general.
|
|
|
|
|
|
|
|
|
|
## Status and usage
|
|
|
|
|
|
|
|
|
|
**Status**: Experimental work in progress. Enabling is strongly advised against
|
|
|
|
|
except for development and testing.
|
|
|
|
|
|
|
|
|
|
**Enable in Clang**: `-Xclang -fexperimental-assignment-tracking`
|
|
|
|
|
|
2023-01-20 14:19:14 +00:00
|
|
|
|
That causes Clang to get LLVM to run the pass `declare-to-assign`. The pass
|
|
|
|
|
converts conventional debug intrinsics to assignment tracking metadata and sets
|
|
|
|
|
the module flag `debug-info-assignment-tracking` to the value `i1 true`. To
|
|
|
|
|
check whether assignment tracking is enabled for a module call
|
|
|
|
|
`isAssignmentTrackingEnabled(const Module &M)` (from `llvm/IR/DebugInfo.h`).
|
2022-11-02 13:42:52 +00:00
|
|
|
|
|
|
|
|
|
## Design and implementation
|
|
|
|
|
|
|
|
|
|
### Assignment markers: `llvm.dbg.assign`
|
|
|
|
|
|
|
|
|
|
`llvm.dbg.value`, a conventional debug intrinsic, marks out a position in the
|
|
|
|
|
IR where a variable takes a particular value. Similarly, Assignment Tracking
|
|
|
|
|
marks out the position of assignments with a new intrinsic called
|
|
|
|
|
`llvm.dbg.assign`.
|
|
|
|
|
|
|
|
|
|
In order to know where in IR it is appropriate to use a memory location for a
|
|
|
|
|
variable, each assignment marker must in some way refer to the store, if any
|
|
|
|
|
(or multiple!), that performs the assignment. That way, the position of the
|
|
|
|
|
store and marker can be considered together when making that choice. Another
|
|
|
|
|
important benefit of referring to the store is that we can then build a two-way
|
|
|
|
|
mapping of stores<->markers that can be used to find markers that need to be
|
|
|
|
|
updated when stores are modified.
|
|
|
|
|
|
|
|
|
|
An `llvm.dbg.assign` marker that is not linked to any instruction signals that
|
|
|
|
|
the store that performed the assignment has been optimised out, and therefore
|
|
|
|
|
the memory location will not be valid for at least some part of the program.
|
|
|
|
|
|
|
|
|
|
Here's the `llvm.dbg.assign` signature. Each parameter is wrapped in
|
|
|
|
|
`MetadataAsValue`, and `Value *` type parameters are first wrapped in
|
|
|
|
|
`ValueAsMetadata`:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
void @llvm.dbg.assign(Value *Value,
|
|
|
|
|
DIExpression *ValueExpression,
|
|
|
|
|
DILocalVariable *Variable,
|
|
|
|
|
DIAssignID *ID,
|
|
|
|
|
Value *Address,
|
|
|
|
|
DIExpression *AddressExpression)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The first three parameters look and behave like an `llvm.dbg.value`. `ID` is a
|
|
|
|
|
reference to a store (see next section). `Address` is the destination address
|
2023-01-12 09:51:08 +00:00
|
|
|
|
of the store and it is modified by `AddressExpression`. An empty/undef/poison
|
|
|
|
|
address means the address component has been killed (the memory address is no
|
|
|
|
|
longer a valid location). LLVM currently encodes variable fragment information
|
|
|
|
|
in `DIExpression`s, so as an implementation quirk the `FragmentInfo` for
|
|
|
|
|
`Variable` is contained within `ValueExpression` only.
|
2022-11-02 13:42:52 +00:00
|
|
|
|
|
|
|
|
|
The formal LLVM-IR signature is:
|
|
|
|
|
```
|
|
|
|
|
void @llvm.dbg.assign(metadata, metadata, metadata, metadata, metadata, metadata)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Instruction link: `DIAssignID`
|
|
|
|
|
|
|
|
|
|
`DIAssignID` metadata is the mechanism that is currently used to encode the
|
|
|
|
|
store<->marker link. The metadata node has no operands and all instances are
|
|
|
|
|
`distinct`; equality is checked for by comparing addresses.
|
|
|
|
|
|
|
|
|
|
`llvm.dbg.assign` intrinsics use a `DIAssignID` metadata node instance as an
|
|
|
|
|
operand. This way it refers to any store-like instruction that has the same
|
|
|
|
|
`DIAssignID` attachment. E.g. For this test.cpp,
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
int fun(int a) {
|
|
|
|
|
return a;
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
compiled without optimisations:
|
|
|
|
|
```
|
|
|
|
|
$ clang++ test.cpp -o test.ll -emit-llvm -S -g -O0 -Xclang -fexperimental-assignment-tracking
|
|
|
|
|
```
|
|
|
|
|
we get:
|
|
|
|
|
```
|
|
|
|
|
define dso_local noundef i32 @_Z3funi(i32 noundef %a) #0 !dbg !8 {
|
|
|
|
|
entry:
|
|
|
|
|
%a.addr = alloca i32, align 4, !DIAssignID !13
|
|
|
|
|
call void @llvm.dbg.assign(metadata i1 undef, metadata !14, metadata !DIExpression(), metadata !13, metadata i32* %a.addr, metadata !DIExpression()), !dbg !15
|
|
|
|
|
store i32 %a, i32* %a.addr, align 4, !DIAssignID !16
|
|
|
|
|
call void @llvm.dbg.assign(metadata i32 %a, metadata !14, metadata !DIExpression(), metadata !16, metadata i32* %a.addr, metadata !DIExpression()), !dbg !15
|
|
|
|
|
%0 = load i32, i32* %a.addr, align 4, !dbg !17
|
|
|
|
|
ret i32 %0, !dbg !18
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
...
|
|
|
|
|
!13 = distinct !DIAssignID()
|
|
|
|
|
!14 = !DILocalVariable(name: "a", ...)
|
|
|
|
|
...
|
|
|
|
|
!16 = distinct !DIAssignID()
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The first `llvm.dbg.assign` refers to the `alloca` through `!DIAssignID !13`,
|
|
|
|
|
and the second refers to the `store` through `!DIAssignID !16`.
|
|
|
|
|
|
|
|
|
|
### Store-like instructions
|
|
|
|
|
|
|
|
|
|
In the absence of a linked `llvm.dbg.assign`, a store to an address that is
|
|
|
|
|
known to be the backing storage for a variable is considered to represent an
|
|
|
|
|
assignment to that variable.
|
|
|
|
|
|
|
|
|
|
This gives us a safe fall-back in cases where `llvm.dbg.assign` intrinsics have
|
|
|
|
|
been deleted, the `DIAssignID` attachment on the store has been dropped, or the
|
|
|
|
|
optimiser has made a once-indirect store (not tracked with Assignment Tracking)
|
|
|
|
|
direct.
|
|
|
|
|
|
|
|
|
|
### Middle-end: Considerations for pass-writers
|
|
|
|
|
|
|
|
|
|
#### Non-debug instruction updates
|
|
|
|
|
|
|
|
|
|
**Cloning** an instruction: nothing new to do. Cloning automatically clones a
|
|
|
|
|
`DIAssignID` attachment. Multiple instructions may have the same `DIAssignID`
|
|
|
|
|
instruction. In this case, the assignment is considered to take place in
|
|
|
|
|
multiple positions in the program.
|
|
|
|
|
|
|
|
|
|
**Moving** a non-debug instruction: nothing new to do. Instructions linked to an
|
|
|
|
|
`llvm.dbg.assign` have their initial IR position marked by the position of the
|
|
|
|
|
`llvm.dbg.assign`.
|
|
|
|
|
|
|
|
|
|
**Deleting** a non-debug instruction: nothing new to do. Simple DSE does not
|
|
|
|
|
require any change; it’s safe to delete an instruction with a `DIAssignID`
|
|
|
|
|
attachment. An `llvm.dbg.assign` that uses a `DIAssignID` that is not attached
|
|
|
|
|
to any instruction indicates that the memory location isn’t valid.
|
|
|
|
|
|
|
|
|
|
**Merging** stores: In many cases no change is required as `DIAssignID`
|
|
|
|
|
attachments are automatically merged if `combineMetadata` is called. One way or
|
|
|
|
|
another, the `DIAssignID` attachments must be merged such that new store
|
|
|
|
|
becomes linked to all the `llvm.dbg.assign` intrinsics that the merged stores
|
|
|
|
|
were linked to. This can be achieved simply by calling a helper function
|
|
|
|
|
`Instruction::mergeDIAssignID`.
|
|
|
|
|
|
|
|
|
|
**Inlining** stores: As stores are inlined we generate `llvm.dbg.assign`
|
|
|
|
|
intrinsics and `DIAssignID` attachments as if the stores represent source
|
|
|
|
|
assignments, just like the in frontend. This isn’t perfect, as stores may have
|
|
|
|
|
been moved, modified or deleted before inlining, but it does at least keep the
|
|
|
|
|
information about the variable correct within the non-inlined scope.
|
|
|
|
|
|
|
|
|
|
**Splitting** stores: SROA and passes that split stores treat `llvm.dbg.assign`
|
|
|
|
|
intrinsics similarly to `llvm.dbg.declare` intrinsics. Clone the
|
|
|
|
|
`llvm.dbg.assign` intrinsics linked to the store, update the FragmentInfo in
|
|
|
|
|
the `ValueExpression`, and give the split stores (and cloned intrinsics) new
|
|
|
|
|
`DIAssignID` attachments each. In other words, treat the split stores as
|
|
|
|
|
separate assignments. For partial DSE (e.g. shortening a memset), we do the
|
|
|
|
|
same except that `llvm.dbg.assign` for the dead fragment gets an `Undef`
|
|
|
|
|
`Address`.
|
|
|
|
|
|
|
|
|
|
**Promoting** allocas and store/loads: `llvm.dbg.assign` intrinsics implicitly
|
|
|
|
|
describe joined values in memory locations at CFG joins, but this is not
|
|
|
|
|
necessarily the case after promoting (or partially promoting) the
|
|
|
|
|
variable. Passes that promote variables are responsible for inserting
|
|
|
|
|
`llvm.dbg.assign` intrinsics after the resultant PHIs generated during
|
|
|
|
|
promotion. `mem2reg` already has to do this (with `llvm.dbg.value`) for
|
|
|
|
|
`llvm.dbg.declare`s. Where a store has no linked intrinsic, the store is
|
|
|
|
|
assumed to represent an assignment for variables stored at the destination
|
|
|
|
|
address.
|
|
|
|
|
|
|
|
|
|
#### Debug intrinsic updates
|
|
|
|
|
|
|
|
|
|
**Moving** a debug intrinsic: avoid moving `llvm.dbg.assign` intrinsics where
|
|
|
|
|
possible, as they represent a source-level assignment, whose position in the
|
|
|
|
|
program should not be affected by optimization passes.
|
|
|
|
|
|
|
|
|
|
**Deleting** a debug intrinsic: Nothing new to do. Just like for conventional
|
|
|
|
|
debug intrinsics, unless it is unreachable, it’s almost always incorrect to
|
|
|
|
|
delete a `llvm.dbg.assign` intrinsic.
|
|
|
|
|
|
|
|
|
|
### Lowering `llvm.dbg.assign` to MIR
|
|
|
|
|
|
|
|
|
|
To begin with only SelectionDAG ISel will be supported. `llvm.dbg.assign`
|
|
|
|
|
intrinsics are lowered to MIR `DBG_INSTR_REF` instructions. Before this happens
|
|
|
|
|
we need to decide where it is appropriate to use memory locations and where we
|
|
|
|
|
must use a non-memory location (or no location) for each variable. In order to
|
|
|
|
|
make those decisions we run a standard fixed-point dataflow analysis that makes
|
|
|
|
|
the choice at each instruction, iteratively joining the results for each block.
|
|
|
|
|
|
|
|
|
|
### TODO list
|
|
|
|
|
|
|
|
|
|
As this is an experimental work in progress so there are some items we still need
|
|
|
|
|
to tackle:
|
|
|
|
|
|
2022-12-09 17:07:19 +00:00
|
|
|
|
* As mentioned in test llvm/test/DebugInfo/assignment-tracking/X86/diamond-3.ll,
|
|
|
|
|
the analysis should treat escaping calls like untagged stores.
|
|
|
|
|
|
2022-11-02 13:42:52 +00:00
|
|
|
|
* The system expects locals to be backed by a local alloca. This isn't always
|
|
|
|
|
the case - sometimes a pointer to storage is passed into a function
|
|
|
|
|
(e.g. sret, byval). We need to be able to handle those cases. See
|
|
|
|
|
llvm/test/DebugInfo/Generic/assignment-tracking/track-assignments.ll and
|
|
|
|
|
clang/test/CodeGen/assignment-tracking/assignment-tracking.cpp for examples.
|
|
|
|
|
|
|
|
|
|
* `trackAssignments` doesn't yet work for variables that have their
|
|
|
|
|
`llvm.dbg.declare` location modified by a `DIExpression`, e.g. when the
|
|
|
|
|
address of the variable is itself stored in an `alloca` with the
|
|
|
|
|
`llvm.dbg.declare` using `DIExpression(DW_OP_deref)`. See `indirectReturn` in
|
|
|
|
|
llvm/test/DebugInfo/Generic/assignment-tracking/track-assignments.ll and in
|
|
|
|
|
clang/test/CodeGen/assignment-tracking/assignment-tracking.cpp for an
|
|
|
|
|
example.
|
|
|
|
|
|
|
|
|
|
* In order to solve the first bullet-point we need to be able to specify that a
|
|
|
|
|
memory location is available without using a `DIAssignID`. This is because
|
|
|
|
|
the storage address is not computed by an instruction (it's an argument
|
|
|
|
|
value) and therefore we have nowhere to put the metadata attachment. To solve
|
|
|
|
|
this we probably need another marker intrinsic to denote "the variable's
|
2023-02-18 23:13:44 +00:00
|
|
|
|
stack home is X address" - similar to `llvm.dbg.declare` except that it needs
|
|
|
|
|
to compose with `llvm.dbg.assign` intrinsics such that the stack home address
|
|
|
|
|
is only selected as a location for the variable when the `llvm.dbg.assign`
|
|
|
|
|
intrinsics agree it should be.
|
2022-11-02 13:42:52 +00:00
|
|
|
|
|
|
|
|
|
* Given the above (a special "the stack home is X" intrinsic), and the fact
|
|
|
|
|
that we can only track assignments with fixed offsets and sizes, I think we
|
|
|
|
|
can probably get rid of the address and address-expression part, since it
|
|
|
|
|
will always be computable with the info we have.
|