Yonghong Song 6c2327a09e [BPF] Add BTF generation for BPF target
BTF is the debug format for BPF, a kernel virtual machine
and widely used for tracing, networking and security, etc ([1]).

Currently only instruction streams are passed to kernel,
the kernel verifier verifies them before execution. In order to
provide better visibility of bpf programs to user space
tools, some debug information, e.g., function names and
debug line information are desirable for kernel so tools
can get such information with better annotation
for jited instructions for performance or other reasons.

The dwarf is too complicated in kernel and for BPF.
Hence, BTF is designed to be the debug format for BPF ([2]).
Right now, pahole supports BTF for types, which
are generated based on dwarf sections in the ELF file.

In order to annotate performance metrics for jited bpf insns,
it is necessary to pass debug line info to the kernel.
Furthermore, we want to pass the actual code to the
kernel because of the following reasons:

. bpf program typically is small so storage overhead
  should be small.
. in bpf land, it is totally possible that
  an application loads the bpf program into the
  kernel and then that application quits, so
  holding debug info by the user space application
  is not practical.
. having source codes directly kept by kernel
  would ease deployment since the original source
  code does not need ship on every hosts and
  kernel-devel package does not need to be
  deployed even if kernel headers are used.

The only reliable time to get the source code is
during compilation time. This will result in both more
accurate information and easier deployment as
stated in the above.

Another consideration is for JIT. The project like bcc
use MCJIT to compile a C program into bpf insns and
load them to the kernel ([3]). The generated BTF sections
will be readily available for such cases as well.

This patch implemented generation of BTF info in llvm
compiler. The BTF related sections will be generated
when both -target bpf and -g are specified. Two sections
are generated:
  .BTF contains all the type and string information, and
  .BTF.ext contains the func_info and line_info.

The separation is related to how two sections are used
differently in bpf loader, e.g., linux libbpf ([4]).
The .BTF section can be loaded into the kernel directly
while .BTF.ext needs loader manipulation before loading
to the kernel. The format of the each section is roughly
defined in llvm:include/llvm/MC/MCBTFContext.h and
from the implementation in llvm:lib/MC/MCBTFContext.cpp.
A later example also shows the contents in each section.

The type and func_info are gathered during CodeGen/AsmPrinter
by traversing dwarf debug_info. The line_info is
gathered in MCObjectStreamer before writing to
the object file. After all the information is gathered,
the two sections are emitted in MCObjectStreamer::finishImpl.

With cmake CMAKE_BUILD_TYPE=Debug, the compiler can
dump out all the tables except insn offset, which
will be resolved later as relocation records.
The debug type "btf" is used for BTFContext dump.

Dwarf tests the debug info generation with
llvm-dwarfdump to decode the binary sections and
check whether the result is expected. Currently
we do not have such a tool yet. We will implement
btf dump functionality in bpftool ([5]) as the bpftool is
considered the recommended tool for bpf introspection.
The implementation for type and func_info is tested
with linux kernel test cases. The line_info is visually
checked with dump from linux kernel libbpf ([4]) and
checked with readelf dumping section raw data.

Note that the .BTF and .BTF.ext information will not
be emitted to assembly code and there is no assembler
support for BTF either.

In the below, with a clang/llvm built with CMAKE_BUILD_TYPE=Debug,
Each table contents are shown for a simple C program.

  -bash-4.2$ cat -n test.c
     1  struct A {
     2    int a;
     3    char b;
     4  };
     5
     6  int test(struct A *t) {
     7    return t->a;
     8  }
  -bash-4.2$ clang -O2 -target bpf -g -mllvm -debug-only=btf -c test.c
  Type Table:
  [1] FUNC name_off=1 info=0x0c000001 size/type=2
        param_type=3
  [2] INT name_off=12 info=0x01000000 size/type=4
        desc=0x01000020
  [3] PTR name_off=0 info=0x02000000 size/type=4
  [4] STRUCT name_off=16 info=0x04000002 size/type=8
        name_off=18 type=2 bit_offset=0
        name_off=20 type=5 bit_offset=32
  [5] INT name_off=22 info=0x01000000 size/type=1
        desc=0x02000008

  String Table:
  0 :
  1 : test
  6 : .text
  12 : int
  16 : A
  18 : a
  20 : b
  22 : char
  27 : test.c
  34 : int test(struct A *t) {
  58 :   return t->a;

  FuncInfo Table:
  sec_name_off=6
        insn_offset=<Omitted> type_id=1

  LineInfo Table:
  sec_name_off=6
        insn_offset=<Omitted> file_name_off=27 line_off=34 line_num=6 column_num=0
        insn_offset=<Omitted> file_name_off=27 line_off=58 line_num=7 column_num=3
  -bash-4.2$ readelf -S test.o
  ......
    [12] .BTF              PROGBITS         0000000000000000  0000028d
       00000000000000c1  0000000000000000           0     0     1
    [13] .BTF.ext          PROGBITS         0000000000000000  0000034e
       0000000000000050  0000000000000000           0     0     1
    [14] .rel.BTF.ext      REL              0000000000000000  00000648
       0000000000000030  0000000000000010          16    13     8
  ......
  -bash-4.2$

The latest linux kernel ([6]) can already support .BTF with type information.
The [7] has the reference implementation in linux kernel side
to support .BTF.ext func_info. The .BTF.ext line_info support is not
implemented yet. If you have difficulty accessing [6], you can
manually do the following to access the code:

  git clone https://github.com/yonghong-song/bpf-next-linux.git
  cd bpf-next-linux
  git checkout btf

The change will push to linux kernel soon once this patch is landed.

References:
[1]. https://www.kernel.org/doc/Documentation/networking/filter.txt
[2]. https://lwn.net/Articles/750695/
[3]. https://github.com/iovisor/bcc
[4]. https://github.com/torvalds/linux/tree/master/tools/lib/bpf
[5]. https://github.com/torvalds/linux/tree/master/tools/bpf/bpftool
[6]. https://github.com/torvalds/linux
[7]. https://github.com/yonghong-song/bpf-next-linux/tree/btf

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>

Differential Revision: https://reviews.llvm.org/D52950

llvm-svn: 344366
2018-10-12 17:01:46 +00:00

164 lines
5.1 KiB
C++

//===- llvm/CodeGen/DwarfFile.h - Dwarf Debug Framework ---------*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_LIB_CODEGEN_ASMPRINTER_DWARFFILE_H
#define LLVM_LIB_CODEGEN_ASMPRINTER_DWARFFILE_H
#include "DwarfStringPool.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/CodeGen/DIE.h"
#include "llvm/IR/Metadata.h"
#include "llvm/Support/Allocator.h"
#include <map>
#include <memory>
#include <utility>
namespace llvm {
class AsmPrinter;
class DbgEntity;
class DbgVariable;
class DbgLabel;
class DwarfCompileUnit;
class DwarfUnit;
class LexicalScope;
class MCSection;
class DwarfFile {
// Target of Dwarf emission, used for sizing of abbreviations.
AsmPrinter *Asm;
BumpPtrAllocator AbbrevAllocator;
// Used to uniquely define abbreviations.
DIEAbbrevSet Abbrevs;
// A pointer to all units in the section.
SmallVector<std::unique_ptr<DwarfCompileUnit>, 1> CUs;
DwarfStringPool StrPool;
/// DWARF v5: The symbol that designates the start of the contribution to
/// the string offsets table. The contribution is shared by all units.
MCSymbol *StringOffsetsStartSym = nullptr;
/// DWARF v5: The symbol that designates the base of the range list table.
/// The table is shared by all units.
MCSymbol *RnglistsTableBaseSym = nullptr;
/// The variables of a lexical scope.
struct ScopeVars {
/// We need to sort Args by ArgNo and check for duplicates. This could also
/// be implemented as a list or vector + std::lower_bound().
std::map<unsigned, DbgVariable *> Args;
SmallVector<DbgVariable *, 8> Locals;
};
/// Collection of DbgVariables of each lexical scope.
DenseMap<LexicalScope *, ScopeVars> ScopeVariables;
/// Collection of DbgLabels of each lexical scope.
using LabelList = SmallVector<DbgLabel *, 4>;
DenseMap<LexicalScope *, LabelList> ScopeLabels;
// Collection of abstract subprogram DIEs.
DenseMap<const MDNode *, DIE *> AbstractSPDies;
DenseMap<const DINode *, std::unique_ptr<DbgEntity>> AbstractEntities;
/// Maps MDNodes for type system with the corresponding DIEs. These DIEs can
/// be shared across CUs, that is why we keep the map here instead
/// of in DwarfCompileUnit.
DenseMap<const MDNode *, DIE *> DITypeNodeToDieMap;
public:
DwarfFile(AsmPrinter *AP, StringRef Pref, BumpPtrAllocator &DA);
const SmallVectorImpl<std::unique_ptr<DwarfCompileUnit>> &getUnits() {
return CUs;
}
/// Compute the size and offset of a DIE given an incoming Offset.
unsigned computeSizeAndOffset(DIE &Die, unsigned Offset);
/// Compute the size and offset of all the DIEs.
void computeSizeAndOffsets();
/// Compute the size and offset of all the DIEs in the given unit.
/// \returns The size of the root DIE.
unsigned computeSizeAndOffsetsForUnit(DwarfUnit *TheU);
/// Add a unit to the list of CUs.
void addUnit(std::unique_ptr<DwarfCompileUnit> U);
/// Emit all of the units to the section listed with the given
/// abbreviation section.
void emitUnits(bool UseOffsets);
/// Emit the given unit to its section.
void emitUnit(DwarfUnit *U, bool UseOffsets);
/// Emit a set of abbreviations to the specific section.
void emitAbbrevs(MCSection *);
/// Emit all of the strings to the section given. If OffsetSection is
/// non-null, emit a table of string offsets to it. If UseRelativeOffsets
/// is false, emit absolute offsets to the strings. Otherwise, emit
/// relocatable references to the strings if they are supported by the target.
void emitStrings(MCSection *StrSection, MCSection *OffsetSection = nullptr,
bool UseRelativeOffsets = false);
// Emit all data for the BTF section
void emitBTFSection(bool IsLittleEndian);
/// Returns the string pool.
DwarfStringPool &getStringPool() { return StrPool; }
MCSymbol *getStringOffsetsStartSym() const { return StringOffsetsStartSym; }
void setStringOffsetsStartSym(MCSymbol *Sym) { StringOffsetsStartSym = Sym; }
MCSymbol *getRnglistsTableBaseSym() const { return RnglistsTableBaseSym; }
void setRnglistsTableBaseSym(MCSymbol *Sym) { RnglistsTableBaseSym = Sym; }
/// \returns false if the variable was merged with a previous one.
bool addScopeVariable(LexicalScope *LS, DbgVariable *Var);
void addScopeLabel(LexicalScope *LS, DbgLabel *Label);
DenseMap<LexicalScope *, ScopeVars> &getScopeVariables() {
return ScopeVariables;
}
DenseMap<LexicalScope *, LabelList> &getScopeLabels() {
return ScopeLabels;
}
DenseMap<const MDNode *, DIE *> &getAbstractSPDies() {
return AbstractSPDies;
}
DenseMap<const DINode *, std::unique_ptr<DbgEntity>> &getAbstractEntities() {
return AbstractEntities;
}
void insertDIE(const MDNode *TypeMD, DIE *Die) {
DITypeNodeToDieMap.insert(std::make_pair(TypeMD, Die));
}
DIE *getDIE(const MDNode *TypeMD) {
return DITypeNodeToDieMap.lookup(TypeMD);
}
};
} // end namespace llvm
#endif // LLVM_LIB_CODEGEN_ASMPRINTER_DWARFFILE_H