464 Commits

Author SHA1 Message Date
Amir Ayupov
9fec33aadc Revert "[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)"
This reverts commit 82bc33ea3f1a539be50ed46919dc53fc6b685da9.

Accidentally pushed unrelated changes.
2024-01-18 19:59:09 -08:00
Amir Ayupov
82bc33ea3f
[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)
Fix the bug where merge-fdata unconditionally outputs boltedcollection 
line, regardless of whether input files have it set.

Test Plan:
Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this
fix.
2024-01-18 19:44:16 -08:00
Jinyang He
b57159cb19
[LoongArch] Support R_LARCH_{ADD,SUB}_ULEB128 for .uleb128 and force relocs when sym is not in section (#76433)
1, Follow RISCV 1df5ea29 to support generates relocs for .uleb128 which
can not be folded. Unlike RISCV, the located content of LoongArch should
be zero. LoongArch fixup uleb128 value by in-place addition and
subtraction reloc types named R_LARCH_{ADD,SUB}_ULEB128. The located
content can affect the result and R_LARCH_ADD_ULEB128 has enough info to
represent the first symbol value, so it needs to be set to zero.
2, Force relocs if sym is not in section so that it can emit relocs for
external symbol.

Fixes:
https://github.com/llvm/llvm-project/pull/72960#issuecomment-1866844679
2024-01-09 15:14:54 +08:00
Craig Topper
e87f33d9ce
[RISCV][MC] Pass MCSubtargetInfo down to shouldForceRelocation and evaluateTargetFixup. (#73721)
Instead of using the STI stored in RISCVAsmBackend, try to get it from
the MCFragment.

This addresses the issue raised here
https://discourse.llvm.org/t/possible-problem-related-to-subtarget-usage/75283
2023-12-07 13:17:58 -08:00
Fangrui Song
1df5ea29b4 [RISCV] Support R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 for .uleb128 directives
For a label difference like `.uleb128 A-B`, MC folds A-B even if A and B
are separated by a RISC-V linker-relaxable instruction. This incorrect
behavior is currently abused by DWARF v5 .debug_loclists/.debug_rnglists
(DW_LLE_offset_pair/DW_RLE_offset_pair entry kinds) implemented in
Clang/LLVM (see https://github.com/ClangBuiltLinux/linux/issues/1719 for
an instance).

96d6e190e9
defined R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128. This patch generates such
a pair of relocations to represent A-B that should not be folded.
GNU assembler computes the directive size by ignoring shrinkable section
content, therefore after linking the value of A-B cannot use more bytes
than the reserved number (`final size of uleb128 value at offset ... exceeds available space`).
We make the same assumption.
```
w1:
  call foo
w2:
  .space 120
w3:
.uleb128 w2-w1  # 1 byte, 0x08
.uleb128 w3-w1  # 2 bytes, 0x80 0x01
```

We do not conservatively reserve 10 bytes (maximum size of an uleb128
for uint64_t) as that would pessimize DWARF v5
DW_LLE_offset_pair/DW_RLE_offset_pair, nullifying the benefits of
introducing R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 relocations.

The supported expressions are limited. For example,

* non-subtraction `.uleb128 A` is not allowed
* `.uleb128 A-B`: report an error unless A and B are both defined and in the same section

The new cl::opt `-riscv-uleb128-reloc` can be used to suppress the
relocations.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D157657
2023-11-09 09:27:32 -08:00
Kazu Hirata
4a0ccfa865 Use llvm::endianness::{big,little,native} (NFC)
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an
enum. This patch replaces support::{big,little,native} with
llvm::endianness::{big,little,native}.
2023-10-12 21:21:45 -07:00
Kazu Hirata
a9d5056862 Use llvm::endianness (NFC)
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form.  This patch replaces
support::endianness with llvm::endianness.
2023-10-10 21:54:15 -07:00
Fangrui Song
d7398a3554 [MC] Use reportError for .uleb128/.sleb128 diagnostic
User errors should use reportError. reportError allows us to continue parsing
the file and collect more diagnostics.

MC/ELF/leb128-err.s is adapted from MC/RISCV/riscv64-64b-pcrel.s
2023-08-09 17:32:09 -07:00
Fangrui Song
ffa829c4c5 [RISCV] Allow delayed decision for ADD/SUB relocations
For a label difference `A-B` in assembly, if A and B are separated by a
linker-relaxable instruction, we should emit a pair of ADD/SUB
relocations (e.g. R_RISCV_ADD32/R_RISCV_SUB32,
R_RISCV_ADD64/R_RISCV_SUB64).

However, the decision is made upfront at parsing time with inadequate
heuristics (`requiresFixup`). As a result, LLVM integrated assembler
incorrectly suppresses R_RISCV_ADD32/R_RISCV_SUB32 for the following
code:
```
// Simplified from a workaround https://android-review.googlesource.com/c/platform/art/+/2619609
// Both end and begin are not defined yet. We decide ADD/SUB relocations upfront and don't know they will be needed.
.4byte end-begin

begin:
  call foo
end:
```

To fix the bug, make two primary changes:

* Delete `requiresFixups` and the overridden emitValueImpl (from D103539).
  This deletion requires accurate evaluateAsAbolute (D153097).
* In MCAssembler::evaluateFixup, call handleAddSubRelocations to emit
  ADD/SUB relocations.

However, there is a remaining issue in
MCExpr.cpp:AttemptToFoldSymbolOffsetDifference. With MCAsmLayout, we may
incorrectly fold A-B even when A and B are separated by a
linker-relaxable instruction. This deficiency is acknowledged (see
D153097), but was previously bypassed by eagerly emitting ADD/SUB using
`requiresFixups`. To address this, we partially reintroduce `canFold` (from
D61584, removed by D103539).

Some expressions (e.g. .size and .fill) need to take the `MCAsmLayout`
code path in AttemptToFoldSymbolOffsetDifference, avoiding relocations
(weird, but matching GNU assembler and needed to match user
expectation). Switch to evaluateKnownAbsolute to leverage the `InSet`
condition.

As a bonus, this change allows for the removal of some relocations for
the FDE `address_range` field in the .eh_frame section.

riscv64-64b-pcrel.s contains the main test.
Add a linker relaxable instruction to dwarf-riscv-relocs.ll to test what
it intends to test.
Merge fixups-relax-diff.ll into fixups-diff.ll.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D155357
2023-07-21 08:37:58 -07:00
Fangrui Song
f9b14f0b7c [MC] Report location information for MCDwarfCallFrameFragment diagnostics 2023-06-26 18:02:22 -07:00
Fangrui Song
0b0672773e [MC] Reject CFI advance_loc separated by a non-private label for Mach-O
Due to Mach-O's .subsections_via_symbols mechanism, non-private labels cannot
appear between .cfi_startproc/.cfi_endproc. Compilers do not produce such
labels, but hand-written assembly may. Give an error. Unfortunately,
emitDwarfAdvanceFrameAddr generated MCExpr doesn't have location
informatin.

Note: evaluateKnownAbsolute is to force folding A-B to a constant even if A and
B are separate by a non-private label. The function is a workaround for some
Mach-O assembler issues and should generally be avoided.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D153167
2023-06-26 14:26:06 -07:00
Daniel Hoekwater
d8cef4f8fa [MC] Detect out of range jumps further than 2^32 bytes
On AArch64, object files may be greater than 2^32 bytes. If an
offset is greater than the max value of a 32-bit unsigned integer,
LLVM silently truncates the offset. Instead, make it return an
error.

Differential Revision: https://reviews.llvm.org/D153494
2023-06-22 21:56:22 +00:00
Volodymyr Sapsai
e872e162a3
Revert "[RISCV] relaxDwarfCallFrameFragment: remove unneeded relocations for relaxation"
Failing buildbot https://green.lab.llvm.org/green/job/clang-stage1-RA/34684/
This reverts commit 11ebe3d906558d93a607347de472e7718127f409.
2023-06-16 12:01:54 -07:00
Fangrui Song
11ebe3d906 [RISCV] relaxDwarfCallFrameFragment: remove unneeded relocations for relaxation
If `evaluateAsAbsolute(Value, Layout.getAssembler())` returns true, we
know the address delta is a constant and can suppress relocations
(usually SET6/SUB6).

While here, replace two evaluateKnownAbsolute calls (subtle; avoid if possible)
with evaluateAsAbsolute.
2023-06-15 23:26:25 -07:00
Fangrui Song
390643e3c5 MCDwarfFrameEmitter::EncodeAdvanceLoc: use SmallVectorImpl instead of raw_ostream. NFC
Similar to 49488490d195591bfc90daef111cd7293f8c80aa.
Remove MCDwarfFrameEmitter::EmitAdvanceLoc which is only called once.
2023-05-07 19:32:53 -07:00
Fangrui Song
49488490d1 [MC] MCDwarfLineAddr::Encode: use SmallVectorImpl instead of raw_ostream. NFC
Similar to D145791: most call sites need a SmallString, but have to provide a
raw_svector_ostream wrapper with unneeded abstraction and overhead:

raw_ostream::write =(inlinable)=> flush_tied_then_write (unneeded TiedStream check) =(virtual function call)=> raw_svector_ostream::write_impl ==> SmallVector append(ItTy in_start, ItTy in_end) (range; less efficient then push_back).

Just use SmallVectorImpl to simplify and optimize code. Unfortunately most call
sites use SmallString, so we have to use SmallVectorImpl<char> instead of
<uint8_t> to avoid large refactoring.
2023-05-07 16:26:52 -07:00
Fangrui Song
a08cbabb28 [MC] Optimize relaxInstruction: remove SmallVector copy. NFC 2023-05-04 23:34:35 -07:00
Fangrui Song
01260bbc6b [MC] registerSymbol: change an output paramter to return value 2023-05-04 22:17:56 -07:00
Alexis Engelke
0c049ea60a [MC] Always encode instruction into SmallVector
All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream
to encode the instruction into a SmallVector. The raw_ostream however
incurs some overhead for the actual encoding.

This change allows an MCCodeEmitter to directly emit an instruction into
a SmallVector without using a raw_ostream and therefore allow for
performance improvments in encoding. A default path that uses existing
raw_ostream implementations is provided.

Reviewed By: MaskRay, Amir

Differential Revision: https://reviews.llvm.org/D145791
2023-04-06 16:21:49 +02:00
spupyrev
eecd41aa09 Revert "Rebase: [Facebook] [MC] Introduce NeverAlign fragment type"
This reverts commit 6d0528636ae54fba75938a79ae7a98dfcc949f72.
2022-07-11 09:50:47 -07:00
Rafael Auler
6d0528636a Rebase: [Facebook] [MC] Introduce NeverAlign fragment type
Summary:
Introduce NeverAlign fragment type.

The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible instructions. NeverAlign fragment ensures that
the next fragment (first instruction in the pair) does not end at a
given alignment boundary by emitting a minimal size nop if necessary.

In effect, it ensures that a pair of macro-fusible instructions is not
split by a given alignment boundary, which is a precondition for
macro-op fusion in modern Intel Cores (64B = cache line size, see Intel
Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode
Pipeline: Macro-Fusion).

This patch introduces functionality used by BOLT when emitting code with
MacroFusion alignment already in place.

The use case is different from BoundaryAlign and instruction bundling:
- BoundaryAlign can be extended to perform the desired alignment for the
first instruction in the macro-op fusion pair (D101817). However, this
approach has higher overhead due to reliance on relaxation as
BoundaryAlign requires in the general case - see
https://reviews.llvm.org/D97982#2710638.
- Instruction bundling: the intent of NeverAlign fragment is to prevent
the first instruction in a pair ending at a given alignment boundary, by
inserting at most one minimum size nop. It's OK if either instruction
crosses the cache line. Padding both instructions using bundles to not
cross the alignment boundary would result in excessive padding. There's
no straightforward way to request instruction bundling to avoid a given
end alignment for the first instruction in the bundle.

LLVM: https://reviews.llvm.org/D97982

Manual rebase conflict history:
https://phabricator.intern.facebook.com/D30142613

Test Plan: sandcastle

Reviewers: #llvm-bolt

Subscribers: phabricatorlinter

Differential Revision: https://phabricator.intern.facebook.com/D31361547
2022-07-11 09:31:52 -07:00
Guillaume Chatelet
412c788ab0 [NFC][Alignment] Use Align in MCAlignFragment 2022-06-15 12:31:00 +00:00
Fangrui Song
689c3a2552 [MC] Fix letter case of some MCSection member functions 2022-03-11 20:07:00 -08:00
serge-sans-paille
ef736a1c39 Cleanup LLVMMC headers
There's a few relevant forward declarations in there that may require downstream
adding explicit includes:

llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h
llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h
llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h

Counting preprocessed lines required to rebuild llvm-project on my setup:
before: 1052436830
after:  1049293745

Which is significant and backs up the change in addition to the usual benefits of
decreasing coupling between headers and compilation units.

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119244
2022-02-09 11:09:17 +01:00
Alex Lorenz
0756aa3978 [macho] add support for emitting macho files with two build version load commands
This patch extends LLVM IR to add metadata that can be used to emit macho files with two build version load commands.
It utilizes "darwin.target_variant.triple" and "darwin.target_variant.SDK Version" metadata names for that,
which will be set by a future patch in clang.

MachO uses two build version load commands to represent an object file / binary that is targeting both the macOS target,
and the Mac Catalyst target. At runtime, a dynamic library that supports both targets can be loaded from either a native
macOS or a Mac Catalyst app on a macOS system. We want to add support to this to upstream to LLVM to be able to build
compiler-rt for both targets, to finish the complete support for the Mac Catalyst platform, which is right now targetable
by upstream clang, but the compiler-rt bits aren't supported because of the lack of this multiple build version support.

Differential Revision: https://reviews.llvm.org/D112189
2021-12-07 18:17:47 -08:00
Peter Smith
e63455d5e0 [MC] Use local MCSubtargetInfo in writeNops
On some architectures such as Arm and X86 the encoding for a nop may
change depending on the subtarget in operation at the time of
encoding. This change replaces the per module MCSubtargetInfo retained
by the targets AsmBackend in favour of passing through the local
MCSubtargetInfo in operation at the time.

On Arm using the architectural NOP instruction can have a performance
benefit on some implementations.

For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to
limit the chances of this causing problems in the future. I've not
done this for other targets such as X86 as there is more frequent use
of the MCSubtargetInfo and it looks to be for stable properties that
we would not expect to vary per function.

This change required threading STI through MCNopsFragment and
MCBoundaryAlignFragment.

I've attempted to take into account the in tree experimental backends.

Differential Revision: https://reviews.llvm.org/D45962
2021-09-07 15:46:19 +01:00
Saleem Abdulrasool
bbea64250f RISCV: adjust handling of relocation emission for RISCV
This re-architects the RISCV relocation handling to bring the
implementation closer in line with the implementation in binutils.  We
would previously aggressively resolve the relocation.  With this
restructuring, we always will emit a paired relocation for any symbolic
difference of the type of S±T[±C] where S and T are labels and C is a
constant.

GAS has a special target hook controlled by `RELOC_EXPANSION_POSSIBLE`
which indicates that a fixup may be expanded into multiple relocations.
This is used by the RISCV backend to always emit a paired relocation -
either ADD[WIDTH] + SUB[WIDTH] for text relocations or SET[WIDTH] +
SUB[WIDTH] for a debug info relocation.  Irrespective of whether linker
relaxation support is enabled, symbolic difference is always emitted as
a paired relocation.

This change also sinks the target specific behaviour down into the
target specific area rather than exposing it to the shared relocation
handling.  In the process, we also sink the "special" handling for debug
information down into the RISCV target.  Although this improves the path
for the other targets, this is not necessarily entirely ideal either.
The changes in the debug info emission could be done through another
type of hook as this functionality would be required by any other target
which wishes to do linker relaxation.  However, as there are no other
targets in LLVM which currently do this, this is a reasonable thing to
do until such time as the code needs to be shared.

Improve the handling of the relocation (and add a reduced test case from
the Linux kernel) to ensure that we handle complex expressions for
symbolic difference.  This ensures that we correct relocate symbols with
the adddends normalized and associated with the addition portion of the
paired relocation.

This change also addresses some review comments from Alex Bradbury about
the relocations meant for use in the DWARF CFA being named incorrectly
(using ADD6 instead of SET6) in the original change which introduced the
relocation type.

This resolves the issues with the symbolic difference emission
sufficiently to enable building the Linux kernel with clang+IAS+lld
(without linker relaxation).

Resolves PR50153, PR50156!
Fixes: ClangBuiltLinux/linux#1023, ClangBuiltLinux/linux#1143

Reviewed By: nickdesaulniers, maskray

Differential Revision: https://reviews.llvm.org/D103539
2021-06-17 08:20:02 -07:00
Fangrui Song
71635ea5ff MCDwarf: Delete uneeded parameter
And change signature
2021-01-21 00:55:07 -08:00
Hongtao Yu
705a4c149d [CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
    NPROBES (ULEB128)
        Number of probes originating from this function.
    NUM_INLINED_FUNCTIONS (ULEB128)
        Number of callees inlined into this function, aka number of
        first-level inlinees
    PROBE RECORDS
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
            reserved
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
    INLINED FUNCTION RECORDS
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
          FUNCTION BODY
            A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2:                                # %bb2
.pseudoprobe    837061429793323041 3 0
.pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe    837061429793323041 4 0
popq    %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878
2020-12-10 17:29:28 -08:00
Mitch Phillips
7ead5f5aa3 Revert "[CSSPGO] Pseudo probe encoding and emission."
This reverts commit b035513c06d1cba2bae8f3e88798334e877523e1.

Reason: Broke the ASan buildbots:
  http://lab.llvm.org:8011/#/builders/5/builds/2269
2020-12-10 15:53:39 -08:00
Hongtao Yu
b035513c06 [CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
    NPROBES (ULEB128)
        Number of probes originating from this function.
    NUM_INLINED_FUNCTIONS (ULEB128)
        Number of callees inlined into this function, aka number of
        first-level inlinees
    PROBE RECORDS
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
            reserved
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
    INLINED FUNCTION RECORDS
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
          FUNCTION BODY
            A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2:                                # %bb2
.pseudoprobe    837061429793323041 3 0
.pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe    837061429793323041 4 0
popq    %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878
2020-12-10 09:50:08 -08:00
Fangrui Song
35be65cb1c [MC] Fix an assert in MCAssembler::writeSectionData to be aware of errors
If MCContext has an error, MCAssembler::layout may stop early
and some MCFragment's may not finalize.

In the Linux kernel, arch/x86/lib/memcpy_64.S could trigger the assert before
"x86_64: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S"
2020-10-29 23:11:18 -07:00
Fangrui Song
45d0343900 [MC] Allow .org directives in SHT_NOBITS sections
This is used by kvm-unit-tests and can be trivially supported.
2020-09-11 15:12:42 -07:00
Jian Cai
c6334db577 [X86] support .nops directive
Add support of .nops on X86. This addresses llvm.org/PR45788.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D82826
2020-08-03 11:50:56 -07:00
Shengchen Kan
e59e39b7c4 [MC] Simplify the logic of applying fixup for fragments, NFCI
Replace mutiple `if else`  clauses with a `switch` clause and remove redundant checks. Before this patch, we need to add a statement like `if(!isa<MCxxxFragment>(Frag)) `  here each time we add a new kind of `MCEncodedFragment` even if it has no fixups. After this patch, we don't need to do that.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D83366
2020-07-09 16:39:13 +08:00
Thomas Preud'homme
6c67ee0f58 [MC] Fix PR45805: infinite recursion in assembler
Give up folding an expression if the fragment of one of the operands
would require laying out a fragment already being laid out. This
prevents hitting an infinite recursion when a fill size expression
refers to a later fragment since computing the offset of that fragment
would require laying out the fill fragment and thus computing its size
expression.

Reviewed By: echristo

Differential Revision: https://reviews.llvm.org/D79570
2020-06-25 15:42:36 +01:00
Shengchen Kan
8bb059ab63 [MC][Bugfix] Remove redundant parameter for relaxInstruction
Summary:
Before this patch, `relaxInstruction` takes three arguments, the first
argument refers to the instruction before relaxation and the third
argument is the output instruction after relaxation. There are two quite
strange things:
  1) The first argument's type is `const MCInst &`, the third
  argument's type is `MCInst &`, but they may be aliased to the same
  variable
  2) The backends of ARM, AMDGPU, RISC-V, Hexagon assume that the third
  argument is a fresh uninitialized `MCInst` even if `relaxInstruction`
  may be called like `relaxInstruction(Relaxed, STI, Relaxed)` in a
  loop.

In this patch, we drop the thrid argument, and let `relaxInstruction`
directly modify the given instruction. Also, this patch fixes the bug https://bugs.llvm.org/show_bug.cgi?id=45580, which is introduced by D77851, and
breaks the assumption of ARM, AMDGPU, RISC-V, Hexagon.

Reviewers: Razer6, MaskRay, jyknight, asb, luismarques, enderby, rtaylor, colinl, bcain

Reviewed By: Razer6, MaskRay, bcain

Subscribers: bcain, nickdesaulniers, nathanchance, wuzish, annita.zhang, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, tpr, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78364
2020-04-21 11:06:55 +08:00
Simon Pilgrim
bcd7f77713 MCObjectWriter.h - remove Endian.h/EndianStream.h/raw_ostream.h includes. NFC
Push these includes down to the the writers that actually need them, a number of which were implicitly relying on the MCObjectWriter.h.
2020-04-17 10:44:08 +01:00
Fangrui Song
e13a8a1fc5 [MC][COFF][ELF] Reject instructions in IMAGE_SCN_CNT_UNINITIALIZED_DATA/SHT_NOBITS sections
For `.bss; nop`, MC inappropriately calls abort() (via report_fatal_error()) with a message
`cannot have fixups in virtual section!`
It is a bug to crash for invalid user input. Fix it by erroring out early in EmitInstToData().

Similarly, emitIntValue() in a virtual section (SHT_NOBITS in ELF) can crash with the mssage
`non-zero initializer found in section '.bss'` (see D4199)
It'd be nice to report the location but so many directives can call emitIntValue()
and it is difficult to track every location.
Note, COFF does not crash because MCAssembler::writeSectionData() is not
called for an IMAGE_SCN_CNT_UNINITIALIZED_DATA section.

Note, GNU as' arm64 backend reports ``Error: attempt to store non-zero value in section `.bss'``
for a non-zero .inst but fails to do so for other instructions.
We simply reject all instructions, even if the encoding is all zeros.

The Mach-O counterpart is D48517 (see `test/MC/MachO/zerofill-text.s`)

Reviewed By: rnk, skan

Differential Revision: https://reviews.llvm.org/D78138
2020-04-15 21:02:47 -07:00
Fangrui Song
90a63f6d2d [MC] Replace MCSection*::getName() with MCSection::getName(). NFC
I plan to use MCSection::getName() in D78138. Having the function in the base class is also convenient for debugging.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D78251
2020-04-15 18:35:27 -07:00
Fangrui Song
7d1ff446b6 [MC] Rename MCSection*::getSectionName() to getName(). NFC
A pending change will merge MCSection*::getName() to MCSection::getName().
2020-04-15 16:48:14 -07:00
Jian Cai
6a38e0e4f5 [MC] Recalculate fragment offsets after relaxation
Summary:
The current relaxation implementation is not correctly adjusting the
size and offsets of fragements in one section based on changes in size
of another if the layout order of the two happened to be such that the
former was visited before the later. Therefore, we need to invalidate
the fragments in all sections after each iteration of relaxation, and
possibly further relax some of them in the next ieration. This fixes
PR#45190.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76114
2020-03-17 14:48:05 -07:00
Shengchen Kan
3a503ce663 [X86] Reduce the number of emitted fragments due to branch align
Summary:
Currently, a BoundaryAlign fragment may be inserted after the branch
that needs to be aligned to truncate the current fragment, this fragment is
unused at most of time. To avoid that, we can insert a new empty Data
fragment instead. Non-relaxable instruction is usually emitted into Data
fragment, so the inserted empty Data fragment will be reused at a high
possibility.

Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: reames, LuoYuanke

Subscribers: llvm-commits, dexonsmith, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75438
2020-03-12 15:37:35 +08:00
Shengchen Kan
af57b139a0 Temporarily Revert [X86] Not track size of the boudaryalign fragment during the layout
Summary: This reverts commit 2ac19feb1571960b8e1479a451b45ab56da7034e.
This commit causes some test cases to run fail when branch is aligned.
2020-03-03 11:15:56 +08:00
Philip Reames
5565820e6e Use range-for in MCAssembler [NFC] 2020-03-02 14:57:35 -08:00
Shengchen Kan
2ac19feb15 [X86] Not track size of the boudaryalign fragment during the layout
Summary:
Currently the boundaryalign fragment caches its size during the process
of layout and then it is relaxed and update the size in each iteration. This
behaviour is unnecessary and ugly.

Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: MaskRay

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75404
2020-03-02 09:32:30 +08:00
Hans Wennborg
2e24219d3c [MC][ARM] Resolve some pcrel fixups at assembly time (PR44929)
MC currently does not emit these relocation types, and lld does not
handle them. Add FKF_Constant as a work-around of some ARM code after
D72197. Eventually we probably should implement these relocation types.

By Fangrui Song!

Differential revision: https://reviews.llvm.org/D72892
2020-02-27 12:43:29 +01:00
Philip Reames
eca4bfea3d [MC] Pull out a relaxFragment helper [NFC]
Having this as it's own function helps to reduce indentation and allows use of return instead of wiring a value over the switch.  A lambda would have also worked, but with slightly deeper nesting.
2020-02-26 13:37:12 -08:00
James Clarke
3f5976c97d [RISCV] Fix evaluating %pcrel_lo against global and weak symbols
Summary:
Previously, we would erroneously turn %pcrel_lo(label), where label has
a %pcrel_hi against a weak symbol, into %pcrel_lo(label + offset), as
evaluatePCRelLo would believe the target independent logic was going to
fold it. Moreover, even if that were fixed, shouldForceRelocation lacks
an MCAsmLayout and thus cannot evaluate the %pcrel_hi fixup to a value
and check the symbol, so we would then erroneously constant-fold the
%pcrel_lo whilst leaving the %pcrel_hi intact. After D72197, this same
sequence also occurs for symbols with global binding, which is triggered
in real-world code.

Instead, as discussed in D71978, we introduce a new FKF_IsTarget flag to
avoid these kinds of issues. All the resolution logic happens in one
place, with no coordination required between RISCAsmBackend and
RISCVMCExpr to ensure they implement the same logic twice. Although the
implementation of %pcrel_hi can be left as target independent, we make
it target dependent to ensure that they are handled identically to
%pcrel_lo, otherwise we risk one of them being constant folded but the
other being preserved. This also allows us to properly support fixup
pairs where the instructions are in different fragments.

Reviewers: asb, lenary, efriedma

Reviewed By: efriedma

Subscribers: arichardson, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73211
2020-01-23 02:05:48 +00:00
Simon Pilgrim
46e2f89364 [MC] writeFragment - assert MCFragment::FT_Fill length is legal.
Silence (clang/MSVC) static analyzer warnings that the fragment data may either write out of bounds of the local array or reference uninitialized data.
2020-01-08 17:19:11 +00:00