[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
//===-- TraceDumper.cpp ---------------------------------------------------===//
|
2021-06-29 14:03:30 -07:00
|
|
|
//
|
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
#include "lldb/Target/TraceDumper.h"
|
2021-06-29 14:03:30 -07:00
|
|
|
#include "lldb/Core/Module.h"
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
#include "lldb/Symbol/CompileUnit.h"
|
2021-06-29 14:03:30 -07:00
|
|
|
#include "lldb/Symbol/Function.h"
|
|
|
|
#include "lldb/Target/ExecutionContext.h"
|
|
|
|
#include "lldb/Target/Process.h"
|
|
|
|
#include "lldb/Target/SectionLoadList.h"
|
2023-01-07 13:43:00 -08:00
|
|
|
#include <optional>
|
2021-06-29 14:03:30 -07:00
|
|
|
|
|
|
|
using namespace lldb;
|
|
|
|
using namespace lldb_private;
|
|
|
|
using namespace llvm;
|
|
|
|
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
/// \return
|
2022-12-10 17:06:43 -08:00
|
|
|
/// The given string or \b std::nullopt if it's empty.
|
2022-12-16 07:56:52 +00:00
|
|
|
static std::optional<const char *> ToOptionalString(const char *s) {
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
if (!s)
|
2022-12-04 16:51:25 -08:00
|
|
|
return std::nullopt;
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
return s;
|
2022-03-22 09:18:53 -07:00
|
|
|
}
|
2022-10-10 12:57:13 -07:00
|
|
|
|
|
|
|
static const char *GetModuleName(const SymbolContext &sc) {
|
|
|
|
if (!sc.module_sp)
|
|
|
|
return nullptr;
|
|
|
|
return sc.module_sp->GetFileSpec().GetFilename().AsCString();
|
|
|
|
}
|
|
|
|
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
/// \return
|
|
|
|
/// The module name (basename if the module is a file, or the actual name if
|
|
|
|
/// it's a virtual module), or \b nullptr if no name nor module was found.
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
static const char *GetModuleName(const TraceDumper::TraceItem &item) {
|
2022-10-10 12:57:13 -07:00
|
|
|
if (!item.symbol_info)
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
return nullptr;
|
2022-10-10 12:57:13 -07:00
|
|
|
return GetModuleName(item.symbol_info->sc);
|
2021-06-29 14:03:30 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
// This custom LineEntry validator is neded because some line_entries have
|
|
|
|
// 0 as line, which is meaningless. Notice that LineEntry::IsValid only
|
|
|
|
// checks that line is not LLDB_INVALID_LINE_NUMBER, i.e. UINT32_MAX.
|
|
|
|
static bool IsLineEntryValid(const LineEntry &line_entry) {
|
|
|
|
return line_entry.IsValid() && line_entry.line > 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/// \return
|
|
|
|
/// \b true if the provided line entries match line, column and source file.
|
|
|
|
/// This function assumes that the line entries are valid.
|
|
|
|
static bool FileLineAndColumnMatches(const LineEntry &a, const LineEntry &b) {
|
|
|
|
if (a.line != b.line)
|
|
|
|
return false;
|
|
|
|
if (a.column != b.column)
|
|
|
|
return false;
|
|
|
|
return a.file == b.file;
|
|
|
|
}
|
|
|
|
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
/// Compare the symbol contexts of the provided \a SymbolInfo
|
2021-06-29 14:03:30 -07:00
|
|
|
/// objects.
|
|
|
|
///
|
|
|
|
/// \return
|
|
|
|
/// \a true if both instructions belong to the same scope level analized
|
|
|
|
/// in the following order:
|
|
|
|
/// - module
|
|
|
|
/// - symbol
|
|
|
|
/// - function
|
2022-10-10 12:57:13 -07:00
|
|
|
/// - inlined function
|
|
|
|
/// - source line info
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
static bool
|
|
|
|
IsSameInstructionSymbolContext(const TraceDumper::SymbolInfo &prev_insn,
|
2022-10-10 12:57:13 -07:00
|
|
|
const TraceDumper::SymbolInfo &insn,
|
|
|
|
bool check_source_line_info = true) {
|
2021-06-29 14:03:30 -07:00
|
|
|
// module checks
|
|
|
|
if (insn.sc.module_sp != prev_insn.sc.module_sp)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
// symbol checks
|
|
|
|
if (insn.sc.symbol != prev_insn.sc.symbol)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
// function checks
|
|
|
|
if (!insn.sc.function && !prev_insn.sc.function)
|
2022-10-10 12:57:13 -07:00
|
|
|
return true; // This means two dangling instruction in the same module. We
|
|
|
|
// can assume they are part of the same unnamed symbol
|
2021-06-29 14:03:30 -07:00
|
|
|
else if (insn.sc.function != prev_insn.sc.function)
|
|
|
|
return false;
|
|
|
|
|
2022-10-10 12:57:13 -07:00
|
|
|
Block *inline_block_a =
|
|
|
|
insn.sc.block ? insn.sc.block->GetContainingInlinedBlock() : nullptr;
|
|
|
|
Block *inline_block_b = prev_insn.sc.block
|
|
|
|
? prev_insn.sc.block->GetContainingInlinedBlock()
|
|
|
|
: nullptr;
|
|
|
|
if (inline_block_a != inline_block_b)
|
|
|
|
return false;
|
|
|
|
|
2021-06-29 14:03:30 -07:00
|
|
|
// line entry checks
|
2022-10-10 12:57:13 -07:00
|
|
|
if (!check_source_line_info)
|
|
|
|
return true;
|
|
|
|
|
2021-06-29 14:03:30 -07:00
|
|
|
const bool curr_line_valid = IsLineEntryValid(insn.sc.line_entry);
|
|
|
|
const bool prev_line_valid = IsLineEntryValid(prev_insn.sc.line_entry);
|
|
|
|
if (curr_line_valid && prev_line_valid)
|
|
|
|
return FileLineAndColumnMatches(insn.sc.line_entry,
|
|
|
|
prev_insn.sc.line_entry);
|
|
|
|
return curr_line_valid == prev_line_valid;
|
|
|
|
}
|
|
|
|
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
class OutputWriterCLI : public TraceDumper::OutputWriter {
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
public:
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
OutputWriterCLI(Stream &s, const TraceDumperOptions &options, Thread &thread)
|
|
|
|
: m_s(s), m_options(options) {
|
|
|
|
m_s.Format("thread #{0}: tid = {1}\n", thread.GetIndexID(), thread.GetID());
|
|
|
|
};
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
void NoMoreData() override { m_s << " no more data\n"; }
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
|
2022-10-10 12:57:13 -07:00
|
|
|
void FunctionCallForest(
|
|
|
|
const std::vector<TraceDumper::FunctionCallUP> &forest) override {
|
|
|
|
for (size_t i = 0; i < forest.size(); i++) {
|
|
|
|
m_s.Format("\n[call tree #{0}]\n", i);
|
|
|
|
DumpFunctionCallTree(*forest[i]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
void TraceItem(const TraceDumper::TraceItem &item) override {
|
|
|
|
if (item.symbol_info) {
|
|
|
|
if (!item.prev_symbol_info ||
|
|
|
|
!IsSameInstructionSymbolContext(*item.prev_symbol_info,
|
|
|
|
*item.symbol_info)) {
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
m_s << " ";
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
const char *module_name = GetModuleName(item);
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
if (!module_name)
|
|
|
|
m_s << "(none)";
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
else if (!item.symbol_info->sc.function && !item.symbol_info->sc.symbol)
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
m_s.Format("{0}`(none)", module_name);
|
|
|
|
else
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
item.symbol_info->sc.DumpStopContext(
|
|
|
|
&m_s, item.symbol_info->exe_ctx.GetTargetPtr(),
|
|
|
|
item.symbol_info->address,
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
/*show_fullpaths=*/false,
|
|
|
|
/*show_module=*/true, /*show_inlined_frames=*/false,
|
|
|
|
/*show_function_arguments=*/true,
|
|
|
|
/*show_function_name=*/true);
|
|
|
|
m_s << "\n";
|
|
|
|
}
|
|
|
|
}
|
2021-06-29 14:03:30 -07:00
|
|
|
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
if (item.error && !m_was_prev_instruction_an_error)
|
|
|
|
m_s << " ...missing instructions\n";
|
2021-06-29 14:03:30 -07:00
|
|
|
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
m_s.Format(" {0}: ", item.id);
|
2021-06-29 14:03:30 -07:00
|
|
|
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
if (m_options.show_timestamps) {
|
|
|
|
m_s.Format("[{0}] ", item.timestamp
|
|
|
|
? formatv("{0:3} ns", *item.timestamp).str()
|
|
|
|
: "unavailable");
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
}
|
|
|
|
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
if (item.event) {
|
|
|
|
m_s << "(event) " << TraceCursor::EventKindToString(*item.event);
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
switch (*item.event) {
|
|
|
|
case eTraceEventCPUChanged:
|
[trace][intel pt] Create a CPU change event and expose it in the dumper
Thanks to fredzhou@fb.com for coming up with this feature.
When tracing in per-cpu mode, we have information of in which cpu we are execution each instruction, which comes from the context switch trace. This diff makes this information available as a `cpu changed event`, which an additional accessor in the cursor `GetCPU()`. As cpu changes are very infrequent, any consumer should listen to cpu change events instead of querying the actual cpu of a trace item. Once a cpu change event is seen, the consumer can invoke GetCPU() to get that information. Also, it's possible to invoke GetCPU() on an arbitrary instruction item, which will return the last cpu seen. However, this call is O(logn) and should be used sparingly.
Manually tested with a sample program that starts on cpu 52, then goes to 18, and then goes back to 52.
Differential Revision: https://reviews.llvm.org/D129340
2022-07-07 18:31:18 -07:00
|
|
|
m_s.Format(" [new CPU={0}]",
|
|
|
|
item.cpu_id ? std::to_string(*item.cpu_id) : "unavailable");
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
break;
|
|
|
|
case eTraceEventHWClockTick:
|
|
|
|
m_s.Format(" [{0}]", item.hw_clock ? std::to_string(*item.hw_clock)
|
|
|
|
: "unavailable");
|
|
|
|
break;
|
|
|
|
case eTraceEventDisabledHW:
|
|
|
|
case eTraceEventDisabledSW:
|
|
|
|
break;
|
[trace][intel pt] Fix per-psb packet decoding
The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is not true. It returns the offset of the PSB packet after invoking pt_insn_sync_forward regardless of how many PSBs are visited later. Instead, I'm now following the approach described in https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode for parallel decoding, which is basically what we need.
A nasty error that happened because of this is that when we had two PSBs (A and B), the following was happening
1. PSB A was processed all the way up to the end of the trace, which includes PSB B.
2. PSB B was then processed until the end of the trace.
The instructions emitted by step 2. were also emitted as part of step 1. so our trace had duplicated chunks. This problem becomes worse when you many PSBs.
As part of making sure this diff is correct, I added some other features that are very useful.
- Added a "synchronization point" event to the TraceCursor, so we can inspect when PSBs are emitted.
- Removed the single-thread decoder. Now the per-cpu decoder and single-thread decoder use the same code paths.
- Use the query decoder to fetch PSBs and timestamps. It turns out that the pt_insn_sync_forward of the instruction decoder can move past several PSBs (this means that we could skip some TSCs). On the other hand, the pt_query_sync_forward method doesn't skip PSBs, so we can get more accurate sync events and timing information.
- Turned LibiptDecoder into PSBBlockDecoder, which decodes single PSB blocks. It is the fundamental processing unit for decoding.
- Added many comments, asserts and improved error handling for clarity.
- Improved DecodeSystemWideTraceForThread so that a TSC is emitted always before a cpu change event. This was a bug that was annoying me before.
- SplitTraceInContinuousExecutions and FindLowestTSCInTrace are now using the query decoder, which can identify precisely each PSB along with their TSCs.
- Added an "only-events" option to the trace dumper to inspect only events.
I did extensive testing and I think we should have an in-house testing CI. The LLVM buildbots are not capable of supporting testing post-mortem traces of hundreds of megabytes. I'll leave that for later, but at least for now the current tests were able to catch most of the issues I encountered when doing this task.
A sample output of a program that I was single stepping is the following. You can see that only one PSB is emitted even though stepping happened!
```
thread #1: tid = 3578223
0: (event) trace synchronization point [offset = 0x0xef0]
a.out`main + 20 at main.cpp:29:20
1: 0x0000000000402479 leaq -0x1210(%rbp), %rax
2: (event) software disabled tracing
3: 0x0000000000402480 movq %rax, %rdi
4: (event) software disabled tracing
5: (event) software disabled tracing
6: 0x0000000000402483 callq 0x403bd4 ; std::vector<int, std::allocator<int>>::vector at stl_vector.h:391:7
7: (event) software disabled tracing
a.out`std::vector<int, std::allocator<int>>::vector() at stl_vector.h:391:7
8: 0x0000000000403bd4 pushq %rbp
9: (event) software disabled tracing
10: 0x0000000000403bd5 movq %rsp, %rbp
11: (event) software disabled tracing
```
This is another trace of a long program with a few PSBs.
```
(lldb) thread trace dump instructions -E -f thread #1: tid = 3603082
0: (event) trace synchronization point [offset = 0x0x80]
47417: (event) software disabled tracing
129231: (event) trace synchronization point [offset = 0x0x800]
146747: (event) software disabled tracing
246076: (event) software disabled tracing
259068: (event) trace synchronization point [offset = 0x0xf78]
259276: (event) software disabled tracing
259278: (event) software disabled tracing
no more data
```
Differential Revision: https://reviews.llvm.org/D131630
2022-08-08 16:24:22 -07:00
|
|
|
case eTraceEventSyncPoint:
|
|
|
|
m_s.Format(" [{0}]", item.sync_point_metadata);
|
|
|
|
break;
|
[trace][intel pt] Create a CPU change event and expose it in the dumper
Thanks to fredzhou@fb.com for coming up with this feature.
When tracing in per-cpu mode, we have information of in which cpu we are execution each instruction, which comes from the context switch trace. This diff makes this information available as a `cpu changed event`, which an additional accessor in the cursor `GetCPU()`. As cpu changes are very infrequent, any consumer should listen to cpu change events instead of querying the actual cpu of a trace item. Once a cpu change event is seen, the consumer can invoke GetCPU() to get that information. Also, it's possible to invoke GetCPU() on an arbitrary instruction item, which will return the last cpu seen. However, this call is O(logn) and should be used sparingly.
Manually tested with a sample program that starts on cpu 52, then goes to 18, and then goes back to 52.
Differential Revision: https://reviews.llvm.org/D129340
2022-07-07 18:31:18 -07:00
|
|
|
}
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
} else if (item.error) {
|
|
|
|
m_s << "(error) " << *item.error;
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
} else {
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
m_s.Format("{0:x+16}", item.load_address);
|
[trace] Avoid a crash in the dumper when disassembling fails
In rare situations, disassemblying would fail that produce an invalid
InstructionSP object. We need to check that it's valid before using.
With this change, now the dumper doesn't crash with dumping instructions of
ioctl. In fact, it now dumps this output
{
"id": 6135,
"loadAddress": "0x7f4bfe5c7515",
"module": "libc.so.6",
"symbol": "ioctl",
"source": "glibc/2.34/src/glibc-2.34/sysdeps/unix/syscall-template.S",
"line": 120,
"column": 0
}
Anyway, we need to investigate why the diassembler failed disassembling that
instruction. From over 2B instructions I was disassembling today, just this
one failed, so this could be a bug in LLVM's core disassembler.
Differential Revision: https://reviews.llvm.org/D129588
2022-07-12 12:41:30 -07:00
|
|
|
if (item.symbol_info && item.symbol_info->instruction) {
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
m_s << " ";
|
[trace] Add a flag to the decoder to output the instruction type
To build complex binding upon instruction trace, additional metadata 'instruction type' is needed.
This diff has followings:
- Add a flag -k / --kind for instruction dump
- Remove SetGranularity and SetIgnoreErros from Trace cursor
Sample output:
```
(lldb) thread trace dump instruction -k
thread #1: tid = 3198805
libc.so.6`_IO_puts + 356
2107: 0x00007ffff7163594 ( return) retq
2106: 0x00007ffff7163592 ( other) popq %r13
2105: 0x00007ffff7163590 ( other) popq %r12
2104: 0x00007ffff716358f ( other) popq %rbp
2103: 0x00007ffff716358e ( other) popq %rbx
2102: 0x00007ffff716358c ( other) movl %ebx, %eax
2101: 0x00007ffff7163588 ( other) addq $0x8, %rsp
2100: 0x00007ffff7163570 ( cond jump) je 0x89588 ; <+344>
2099: 0x00007ffff716356e ( other) decl (%rdx)
2098: 0x00007ffff7163565 ( cond jump) je 0x8956e ; <+318>
2097: 0x00007ffff716355e ( other) cmpl $0x0, 0x33c02b(%rip) ; __libc_multiple_threads
2096: 0x00007ffff7163556 ( other) movq $0x0, 0x8(%rdx)
2095: 0x00007ffff7163554 ( cond jump) jne 0x89588 ; <+344>
2094: 0x00007ffff7163550 ( other) subl $0x1, 0x4(%rdx)
2093: 0x00007ffff7163549 ( other) movq 0x88(%rbp), %rdx
2092: 0x00007ffff7163547 ( cond jump) jne 0x89588 ; <+344>
2091: 0x00007ffff7163540 ( other) testl $0x8000, (%rbp) ; imm = 0x8000
2090: 0x00007ffff716353c ( other) cmovaq %rax, %rbx
2089: 0x00007ffff7163535 ( other) cmpq $0x7fffffff, %rbx ; imm = 0x7FFFFFFF
2088: 0x00007ffff7163530 ( other) movl $0x7fffffff, %eax ; imm = 0x7FFFFFFF
```
Reviewed By: wallace
Differential Revision: https://reviews.llvm.org/D128477
2022-07-12 16:09:03 -07:00
|
|
|
item.symbol_info->instruction->Dump(
|
|
|
|
&m_s, /*max_opcode_byte_size=*/0,
|
|
|
|
/*show_address=*/false,
|
|
|
|
/*show_bytes=*/false, m_options.show_control_flow_kind,
|
|
|
|
&item.symbol_info->exe_ctx, &item.symbol_info->sc,
|
|
|
|
/*prev_sym_ctx=*/nullptr,
|
|
|
|
/*disassembly_addr_format=*/nullptr,
|
|
|
|
/*max_address_text_size=*/0);
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
}
|
|
|
|
}
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
|
|
|
|
m_was_prev_instruction_an_error = (bool)item.error;
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
m_s << "\n";
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
2022-10-10 12:57:13 -07:00
|
|
|
void
|
|
|
|
DumpSegmentContext(const TraceDumper::FunctionCall::TracedSegment &segment) {
|
|
|
|
if (segment.GetOwningCall().IsError()) {
|
|
|
|
m_s << "<tracing errors>";
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
const SymbolContext &first_sc = segment.GetFirstInstructionSymbolInfo().sc;
|
|
|
|
first_sc.DumpStopContext(
|
|
|
|
&m_s, segment.GetFirstInstructionSymbolInfo().exe_ctx.GetTargetPtr(),
|
|
|
|
segment.GetFirstInstructionSymbolInfo().address,
|
|
|
|
/*show_fullpaths=*/false,
|
|
|
|
/*show_module=*/true, /*show_inlined_frames=*/false,
|
|
|
|
/*show_function_arguments=*/true,
|
|
|
|
/*show_function_name=*/true);
|
|
|
|
m_s << " to ";
|
|
|
|
const SymbolContext &last_sc = segment.GetLastInstructionSymbolInfo().sc;
|
|
|
|
if (IsLineEntryValid(first_sc.line_entry) &&
|
|
|
|
IsLineEntryValid(last_sc.line_entry)) {
|
|
|
|
m_s.Format("{0}:{1}", last_sc.line_entry.line, last_sc.line_entry.column);
|
|
|
|
} else {
|
|
|
|
last_sc.DumpStopContext(
|
|
|
|
&m_s, segment.GetFirstInstructionSymbolInfo().exe_ctx.GetTargetPtr(),
|
|
|
|
segment.GetLastInstructionSymbolInfo().address,
|
|
|
|
/*show_fullpaths=*/false,
|
|
|
|
/*show_module=*/false, /*show_inlined_frames=*/false,
|
|
|
|
/*show_function_arguments=*/false,
|
|
|
|
/*show_function_name=*/false);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void DumpUntracedContext(const TraceDumper::FunctionCall &function_call) {
|
|
|
|
if (function_call.IsError()) {
|
|
|
|
m_s << "tracing error";
|
|
|
|
}
|
|
|
|
const SymbolContext &sc = function_call.GetSymbolInfo().sc;
|
|
|
|
|
|
|
|
const char *module_name = GetModuleName(sc);
|
|
|
|
if (!module_name)
|
|
|
|
m_s << "(none)";
|
|
|
|
else if (!sc.function && !sc.symbol)
|
|
|
|
m_s << module_name << "`(none)";
|
|
|
|
else
|
|
|
|
m_s << module_name << "`" << sc.GetFunctionName().AsCString();
|
|
|
|
}
|
|
|
|
|
|
|
|
void DumpFunctionCallTree(const TraceDumper::FunctionCall &function_call) {
|
2022-10-15 18:52:22 -07:00
|
|
|
if (function_call.GetUntracedPrefixSegment()) {
|
2022-10-10 12:57:13 -07:00
|
|
|
m_s.Indent();
|
|
|
|
DumpUntracedContext(function_call);
|
|
|
|
m_s << "\n";
|
|
|
|
|
|
|
|
m_s.IndentMore();
|
2022-10-15 18:52:22 -07:00
|
|
|
DumpFunctionCallTree(function_call.GetUntracedPrefixSegment()->GetNestedCall());
|
2022-10-10 12:57:13 -07:00
|
|
|
m_s.IndentLess();
|
|
|
|
}
|
|
|
|
|
|
|
|
for (const TraceDumper::FunctionCall::TracedSegment &segment :
|
|
|
|
function_call.GetTracedSegments()) {
|
|
|
|
m_s.Indent();
|
|
|
|
DumpSegmentContext(segment);
|
|
|
|
m_s.Format(" [{0}, {1}]\n", segment.GetFirstInstructionID(),
|
|
|
|
segment.GetLastInstructionID());
|
|
|
|
|
|
|
|
segment.IfNestedCall([&](const TraceDumper::FunctionCall &nested_call) {
|
|
|
|
m_s.IndentMore();
|
|
|
|
DumpFunctionCallTree(nested_call);
|
|
|
|
m_s.IndentLess();
|
|
|
|
});
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
Stream &m_s;
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
TraceDumperOptions m_options;
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
bool m_was_prev_instruction_an_error = false;
|
|
|
|
};
|
|
|
|
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
class OutputWriterJSON : public TraceDumper::OutputWriter {
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
/* schema:
|
|
|
|
error_message: string
|
|
|
|
| {
|
[trace][intel pt] Create a CPU change event and expose it in the dumper
Thanks to fredzhou@fb.com for coming up with this feature.
When tracing in per-cpu mode, we have information of in which cpu we are execution each instruction, which comes from the context switch trace. This diff makes this information available as a `cpu changed event`, which an additional accessor in the cursor `GetCPU()`. As cpu changes are very infrequent, any consumer should listen to cpu change events instead of querying the actual cpu of a trace item. Once a cpu change event is seen, the consumer can invoke GetCPU() to get that information. Also, it's possible to invoke GetCPU() on an arbitrary instruction item, which will return the last cpu seen. However, this call is O(logn) and should be used sparingly.
Manually tested with a sample program that starts on cpu 52, then goes to 18, and then goes back to 52.
Differential Revision: https://reviews.llvm.org/D129340
2022-07-07 18:31:18 -07:00
|
|
|
"event": string,
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
"id": decimal,
|
|
|
|
"tsc"?: string decimal,
|
[trace][intel pt] Create a CPU change event and expose it in the dumper
Thanks to fredzhou@fb.com for coming up with this feature.
When tracing in per-cpu mode, we have information of in which cpu we are execution each instruction, which comes from the context switch trace. This diff makes this information available as a `cpu changed event`, which an additional accessor in the cursor `GetCPU()`. As cpu changes are very infrequent, any consumer should listen to cpu change events instead of querying the actual cpu of a trace item. Once a cpu change event is seen, the consumer can invoke GetCPU() to get that information. Also, it's possible to invoke GetCPU() on an arbitrary instruction item, which will return the last cpu seen. However, this call is O(logn) and should be used sparingly.
Manually tested with a sample program that starts on cpu 52, then goes to 18, and then goes back to 52.
Differential Revision: https://reviews.llvm.org/D129340
2022-07-07 18:31:18 -07:00
|
|
|
"cpuId"? decimal,
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
} | {
|
[trace][intel pt] Create a CPU change event and expose it in the dumper
Thanks to fredzhou@fb.com for coming up with this feature.
When tracing in per-cpu mode, we have information of in which cpu we are execution each instruction, which comes from the context switch trace. This diff makes this information available as a `cpu changed event`, which an additional accessor in the cursor `GetCPU()`. As cpu changes are very infrequent, any consumer should listen to cpu change events instead of querying the actual cpu of a trace item. Once a cpu change event is seen, the consumer can invoke GetCPU() to get that information. Also, it's possible to invoke GetCPU() on an arbitrary instruction item, which will return the last cpu seen. However, this call is O(logn) and should be used sparingly.
Manually tested with a sample program that starts on cpu 52, then goes to 18, and then goes back to 52.
Differential Revision: https://reviews.llvm.org/D129340
2022-07-07 18:31:18 -07:00
|
|
|
"error": string,
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
"id": decimal,
|
|
|
|
"tsc"?: string decimal,
|
|
|
|
| {
|
[trace][intel pt] Create a CPU change event and expose it in the dumper
Thanks to fredzhou@fb.com for coming up with this feature.
When tracing in per-cpu mode, we have information of in which cpu we are execution each instruction, which comes from the context switch trace. This diff makes this information available as a `cpu changed event`, which an additional accessor in the cursor `GetCPU()`. As cpu changes are very infrequent, any consumer should listen to cpu change events instead of querying the actual cpu of a trace item. Once a cpu change event is seen, the consumer can invoke GetCPU() to get that information. Also, it's possible to invoke GetCPU() on an arbitrary instruction item, which will return the last cpu seen. However, this call is O(logn) and should be used sparingly.
Manually tested with a sample program that starts on cpu 52, then goes to 18, and then goes back to 52.
Differential Revision: https://reviews.llvm.org/D129340
2022-07-07 18:31:18 -07:00
|
|
|
"loadAddress": string decimal,
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
"id": decimal,
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
"hwClock"?: string decimal,
|
[trace][intel pt] Fix per-psb packet decoding
The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is not true. It returns the offset of the PSB packet after invoking pt_insn_sync_forward regardless of how many PSBs are visited later. Instead, I'm now following the approach described in https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode for parallel decoding, which is basically what we need.
A nasty error that happened because of this is that when we had two PSBs (A and B), the following was happening
1. PSB A was processed all the way up to the end of the trace, which includes PSB B.
2. PSB B was then processed until the end of the trace.
The instructions emitted by step 2. were also emitted as part of step 1. so our trace had duplicated chunks. This problem becomes worse when you many PSBs.
As part of making sure this diff is correct, I added some other features that are very useful.
- Added a "synchronization point" event to the TraceCursor, so we can inspect when PSBs are emitted.
- Removed the single-thread decoder. Now the per-cpu decoder and single-thread decoder use the same code paths.
- Use the query decoder to fetch PSBs and timestamps. It turns out that the pt_insn_sync_forward of the instruction decoder can move past several PSBs (this means that we could skip some TSCs). On the other hand, the pt_query_sync_forward method doesn't skip PSBs, so we can get more accurate sync events and timing information.
- Turned LibiptDecoder into PSBBlockDecoder, which decodes single PSB blocks. It is the fundamental processing unit for decoding.
- Added many comments, asserts and improved error handling for clarity.
- Improved DecodeSystemWideTraceForThread so that a TSC is emitted always before a cpu change event. This was a bug that was annoying me before.
- SplitTraceInContinuousExecutions and FindLowestTSCInTrace are now using the query decoder, which can identify precisely each PSB along with their TSCs.
- Added an "only-events" option to the trace dumper to inspect only events.
I did extensive testing and I think we should have an in-house testing CI. The LLVM buildbots are not capable of supporting testing post-mortem traces of hundreds of megabytes. I'll leave that for later, but at least for now the current tests were able to catch most of the issues I encountered when doing this task.
A sample output of a program that I was single stepping is the following. You can see that only one PSB is emitted even though stepping happened!
```
thread #1: tid = 3578223
0: (event) trace synchronization point [offset = 0x0xef0]
a.out`main + 20 at main.cpp:29:20
1: 0x0000000000402479 leaq -0x1210(%rbp), %rax
2: (event) software disabled tracing
3: 0x0000000000402480 movq %rax, %rdi
4: (event) software disabled tracing
5: (event) software disabled tracing
6: 0x0000000000402483 callq 0x403bd4 ; std::vector<int, std::allocator<int>>::vector at stl_vector.h:391:7
7: (event) software disabled tracing
a.out`std::vector<int, std::allocator<int>>::vector() at stl_vector.h:391:7
8: 0x0000000000403bd4 pushq %rbp
9: (event) software disabled tracing
10: 0x0000000000403bd5 movq %rsp, %rbp
11: (event) software disabled tracing
```
This is another trace of a long program with a few PSBs.
```
(lldb) thread trace dump instructions -E -f thread #1: tid = 3603082
0: (event) trace synchronization point [offset = 0x0x80]
47417: (event) software disabled tracing
129231: (event) trace synchronization point [offset = 0x0x800]
146747: (event) software disabled tracing
246076: (event) software disabled tracing
259068: (event) trace synchronization point [offset = 0x0xf78]
259276: (event) software disabled tracing
259278: (event) software disabled tracing
no more data
```
Differential Revision: https://reviews.llvm.org/D131630
2022-08-08 16:24:22 -07:00
|
|
|
"syncPointMetadata"?: string,
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
"timestamp_ns"?: string decimal,
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
"module"?: string,
|
|
|
|
"symbol"?: string,
|
|
|
|
"line"?: decimal,
|
|
|
|
"column"?: decimal,
|
|
|
|
"source"?: string,
|
|
|
|
"mnemonic"?: string,
|
[trace] Add instruction control flow kind to JSON trace dumper's output
D128477 adds a '-k' flag which displays each instruction's control flow in the `thread trace dump instructions` command's non-json output (ie no '-j' or '-J' flag)
This diff adds the instruction control flow kind to the `thread trace dump instructions` command's JSON output (ie '-j' or '-J' flag)
Test Plan:
Confirm "controlFlowKind" is present in JSON when '-k' is provided
```
(lldb) thread trace dump instructions -J -k
[
{ [141/1952]
"id": 7755,
"loadAddress": "0x400868",
"module": "test.out",
"symbol": "main",
"mnemonic": "jmp",
"controlFlowKind": "jump",
"source": "/home/jakobjohnson/jakob-dev/test.cpp",
"line": 41,
"column": 29
},
{
"id": 7753,
"loadAddress": "0x7ffff7b54dab",
"module": "libstdc++.so.6",
"symbol": "std::ostream::flush()",
"mnemonic": "retq",
"controlFlowKind": "return"
},
{
"id": 7752,
"loadAddress": "0x7ffff7b54daa",
"module": "libstdc++.so.6",
"symbol": "std::ostream::flush()",
"mnemonic": "popq",
"controlFlowKind": "other"
},
...
]
```
Confirm "controlFlowKind" is not present when '-k' isn't provided
```
(lldb) thread trace dump instructions -J
[
{
"id": 7755,
"loadAddress": "0x400868",
"module": "test.out",
"symbol": "main",
"mnemonic": "jmp",
"source": "/home/jakobjohnson/jakob-dev/test.cpp",
"line": 41,
"column": 29
},
{
"id": 7753,
"loadAddress": "0x7ffff7b54dab",
"module": "libstdc++.so.6",
"symbol": "std::ostream::flush()",
"mnemonic": "retq"
},
{
"id": 7752,
"loadAddress": "0x7ffff7b54daa",
"module": "libstdc++.so.6",
"symbol": "std::ostream::flush()",
"mnemonic": "popq"
},
```
Differential Revision: https://reviews.llvm.org/D130607
2022-07-26 12:19:54 -07:00
|
|
|
"controlFlowKind"?: string,
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
}
|
|
|
|
*/
|
|
|
|
public:
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
OutputWriterJSON(Stream &s, const TraceDumperOptions &options)
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
: m_s(s), m_options(options),
|
|
|
|
m_j(m_s.AsRawOstream(),
|
|
|
|
/*IndentSize=*/options.pretty_print_json ? 2 : 0) {
|
|
|
|
m_j.arrayBegin();
|
|
|
|
};
|
|
|
|
|
|
|
|
~OutputWriterJSON() { m_j.arrayEnd(); }
|
|
|
|
|
2022-10-15 18:52:22 -07:00
|
|
|
void FunctionCallForest(
|
|
|
|
const std::vector<TraceDumper::FunctionCallUP> &forest) override {
|
|
|
|
for (size_t i = 0; i < forest.size(); i++) {
|
|
|
|
m_j.object([&] { DumpFunctionCallTree(*forest[i]); });
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void DumpFunctionCallTree(const TraceDumper::FunctionCall &function_call) {
|
|
|
|
if (function_call.GetUntracedPrefixSegment()) {
|
|
|
|
m_j.attributeObject("untracedPrefixSegment", [&] {
|
|
|
|
m_j.attributeObject("nestedCall", [&] {
|
|
|
|
DumpFunctionCallTree(
|
|
|
|
function_call.GetUntracedPrefixSegment()->GetNestedCall());
|
|
|
|
});
|
|
|
|
});
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!function_call.GetTracedSegments().empty()) {
|
|
|
|
m_j.attributeArray("tracedSegments", [&] {
|
|
|
|
for (const TraceDumper::FunctionCall::TracedSegment &segment :
|
|
|
|
function_call.GetTracedSegments()) {
|
|
|
|
m_j.object([&] {
|
|
|
|
m_j.attribute("firstInstructionId",
|
|
|
|
std::to_string(segment.GetFirstInstructionID()));
|
|
|
|
m_j.attribute("lastInstructionId",
|
|
|
|
std::to_string(segment.GetLastInstructionID()));
|
|
|
|
segment.IfNestedCall(
|
|
|
|
[&](const TraceDumper::FunctionCall &nested_call) {
|
|
|
|
m_j.attributeObject(
|
|
|
|
"nestedCall", [&] { DumpFunctionCallTree(nested_call); });
|
|
|
|
});
|
|
|
|
});
|
|
|
|
}
|
|
|
|
});
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
[trace] Avoid a crash in the dumper when disassembling fails
In rare situations, disassemblying would fail that produce an invalid
InstructionSP object. We need to check that it's valid before using.
With this change, now the dumper doesn't crash with dumping instructions of
ioctl. In fact, it now dumps this output
{
"id": 6135,
"loadAddress": "0x7f4bfe5c7515",
"module": "libc.so.6",
"symbol": "ioctl",
"source": "glibc/2.34/src/glibc-2.34/sysdeps/unix/syscall-template.S",
"line": 120,
"column": 0
}
Anyway, we need to investigate why the diassembler failed disassembling that
instruction. From over 2B instructions I was disassembling today, just this
one failed, so this could be a bug in LLVM's core disassembler.
Differential Revision: https://reviews.llvm.org/D129588
2022-07-12 12:41:30 -07:00
|
|
|
void DumpEvent(const TraceDumper::TraceItem &item) {
|
|
|
|
m_j.attribute("event", TraceCursor::EventKindToString(*item.event));
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
switch (*item.event) {
|
|
|
|
case eTraceEventCPUChanged:
|
[trace][intel pt] Create a CPU change event and expose it in the dumper
Thanks to fredzhou@fb.com for coming up with this feature.
When tracing in per-cpu mode, we have information of in which cpu we are execution each instruction, which comes from the context switch trace. This diff makes this information available as a `cpu changed event`, which an additional accessor in the cursor `GetCPU()`. As cpu changes are very infrequent, any consumer should listen to cpu change events instead of querying the actual cpu of a trace item. Once a cpu change event is seen, the consumer can invoke GetCPU() to get that information. Also, it's possible to invoke GetCPU() on an arbitrary instruction item, which will return the last cpu seen. However, this call is O(logn) and should be used sparingly.
Manually tested with a sample program that starts on cpu 52, then goes to 18, and then goes back to 52.
Differential Revision: https://reviews.llvm.org/D129340
2022-07-07 18:31:18 -07:00
|
|
|
m_j.attribute("cpuId", item.cpu_id);
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
break;
|
|
|
|
case eTraceEventHWClockTick:
|
|
|
|
m_j.attribute("hwClock", item.hw_clock);
|
|
|
|
break;
|
|
|
|
case eTraceEventDisabledHW:
|
|
|
|
case eTraceEventDisabledSW:
|
|
|
|
break;
|
[trace][intel pt] Fix per-psb packet decoding
The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is not true. It returns the offset of the PSB packet after invoking pt_insn_sync_forward regardless of how many PSBs are visited later. Instead, I'm now following the approach described in https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode for parallel decoding, which is basically what we need.
A nasty error that happened because of this is that when we had two PSBs (A and B), the following was happening
1. PSB A was processed all the way up to the end of the trace, which includes PSB B.
2. PSB B was then processed until the end of the trace.
The instructions emitted by step 2. were also emitted as part of step 1. so our trace had duplicated chunks. This problem becomes worse when you many PSBs.
As part of making sure this diff is correct, I added some other features that are very useful.
- Added a "synchronization point" event to the TraceCursor, so we can inspect when PSBs are emitted.
- Removed the single-thread decoder. Now the per-cpu decoder and single-thread decoder use the same code paths.
- Use the query decoder to fetch PSBs and timestamps. It turns out that the pt_insn_sync_forward of the instruction decoder can move past several PSBs (this means that we could skip some TSCs). On the other hand, the pt_query_sync_forward method doesn't skip PSBs, so we can get more accurate sync events and timing information.
- Turned LibiptDecoder into PSBBlockDecoder, which decodes single PSB blocks. It is the fundamental processing unit for decoding.
- Added many comments, asserts and improved error handling for clarity.
- Improved DecodeSystemWideTraceForThread so that a TSC is emitted always before a cpu change event. This was a bug that was annoying me before.
- SplitTraceInContinuousExecutions and FindLowestTSCInTrace are now using the query decoder, which can identify precisely each PSB along with their TSCs.
- Added an "only-events" option to the trace dumper to inspect only events.
I did extensive testing and I think we should have an in-house testing CI. The LLVM buildbots are not capable of supporting testing post-mortem traces of hundreds of megabytes. I'll leave that for later, but at least for now the current tests were able to catch most of the issues I encountered when doing this task.
A sample output of a program that I was single stepping is the following. You can see that only one PSB is emitted even though stepping happened!
```
thread #1: tid = 3578223
0: (event) trace synchronization point [offset = 0x0xef0]
a.out`main + 20 at main.cpp:29:20
1: 0x0000000000402479 leaq -0x1210(%rbp), %rax
2: (event) software disabled tracing
3: 0x0000000000402480 movq %rax, %rdi
4: (event) software disabled tracing
5: (event) software disabled tracing
6: 0x0000000000402483 callq 0x403bd4 ; std::vector<int, std::allocator<int>>::vector at stl_vector.h:391:7
7: (event) software disabled tracing
a.out`std::vector<int, std::allocator<int>>::vector() at stl_vector.h:391:7
8: 0x0000000000403bd4 pushq %rbp
9: (event) software disabled tracing
10: 0x0000000000403bd5 movq %rsp, %rbp
11: (event) software disabled tracing
```
This is another trace of a long program with a few PSBs.
```
(lldb) thread trace dump instructions -E -f thread #1: tid = 3603082
0: (event) trace synchronization point [offset = 0x0x80]
47417: (event) software disabled tracing
129231: (event) trace synchronization point [offset = 0x0x800]
146747: (event) software disabled tracing
246076: (event) software disabled tracing
259068: (event) trace synchronization point [offset = 0x0xf78]
259276: (event) software disabled tracing
259278: (event) software disabled tracing
no more data
```
Differential Revision: https://reviews.llvm.org/D131630
2022-08-08 16:24:22 -07:00
|
|
|
case eTraceEventSyncPoint:
|
|
|
|
m_j.attribute("syncPointMetadata", item.sync_point_metadata);
|
|
|
|
break;
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
}
|
[trace] Avoid a crash in the dumper when disassembling fails
In rare situations, disassemblying would fail that produce an invalid
InstructionSP object. We need to check that it's valid before using.
With this change, now the dumper doesn't crash with dumping instructions of
ioctl. In fact, it now dumps this output
{
"id": 6135,
"loadAddress": "0x7f4bfe5c7515",
"module": "libc.so.6",
"symbol": "ioctl",
"source": "glibc/2.34/src/glibc-2.34/sysdeps/unix/syscall-template.S",
"line": 120,
"column": 0
}
Anyway, we need to investigate why the diassembler failed disassembling that
instruction. From over 2B instructions I was disassembling today, just this
one failed, so this could be a bug in LLVM's core disassembler.
Differential Revision: https://reviews.llvm.org/D129588
2022-07-12 12:41:30 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
void DumpInstruction(const TraceDumper::TraceItem &item) {
|
|
|
|
m_j.attribute("loadAddress", formatv("{0:x}", item.load_address));
|
|
|
|
if (item.symbol_info) {
|
|
|
|
m_j.attribute("module", ToOptionalString(GetModuleName(item)));
|
|
|
|
m_j.attribute(
|
|
|
|
"symbol",
|
|
|
|
ToOptionalString(item.symbol_info->sc.GetFunctionName().AsCString()));
|
|
|
|
|
[trace] Add instruction control flow kind to JSON trace dumper's output
D128477 adds a '-k' flag which displays each instruction's control flow in the `thread trace dump instructions` command's non-json output (ie no '-j' or '-J' flag)
This diff adds the instruction control flow kind to the `thread trace dump instructions` command's JSON output (ie '-j' or '-J' flag)
Test Plan:
Confirm "controlFlowKind" is present in JSON when '-k' is provided
```
(lldb) thread trace dump instructions -J -k
[
{ [141/1952]
"id": 7755,
"loadAddress": "0x400868",
"module": "test.out",
"symbol": "main",
"mnemonic": "jmp",
"controlFlowKind": "jump",
"source": "/home/jakobjohnson/jakob-dev/test.cpp",
"line": 41,
"column": 29
},
{
"id": 7753,
"loadAddress": "0x7ffff7b54dab",
"module": "libstdc++.so.6",
"symbol": "std::ostream::flush()",
"mnemonic": "retq",
"controlFlowKind": "return"
},
{
"id": 7752,
"loadAddress": "0x7ffff7b54daa",
"module": "libstdc++.so.6",
"symbol": "std::ostream::flush()",
"mnemonic": "popq",
"controlFlowKind": "other"
},
...
]
```
Confirm "controlFlowKind" is not present when '-k' isn't provided
```
(lldb) thread trace dump instructions -J
[
{
"id": 7755,
"loadAddress": "0x400868",
"module": "test.out",
"symbol": "main",
"mnemonic": "jmp",
"source": "/home/jakobjohnson/jakob-dev/test.cpp",
"line": 41,
"column": 29
},
{
"id": 7753,
"loadAddress": "0x7ffff7b54dab",
"module": "libstdc++.so.6",
"symbol": "std::ostream::flush()",
"mnemonic": "retq"
},
{
"id": 7752,
"loadAddress": "0x7ffff7b54daa",
"module": "libstdc++.so.6",
"symbol": "std::ostream::flush()",
"mnemonic": "popq"
},
```
Differential Revision: https://reviews.llvm.org/D130607
2022-07-26 12:19:54 -07:00
|
|
|
if (lldb::InstructionSP instruction = item.symbol_info->instruction) {
|
|
|
|
ExecutionContext exe_ctx = item.symbol_info->exe_ctx;
|
[trace] Avoid a crash in the dumper when disassembling fails
In rare situations, disassemblying would fail that produce an invalid
InstructionSP object. We need to check that it's valid before using.
With this change, now the dumper doesn't crash with dumping instructions of
ioctl. In fact, it now dumps this output
{
"id": 6135,
"loadAddress": "0x7f4bfe5c7515",
"module": "libc.so.6",
"symbol": "ioctl",
"source": "glibc/2.34/src/glibc-2.34/sysdeps/unix/syscall-template.S",
"line": 120,
"column": 0
}
Anyway, we need to investigate why the diassembler failed disassembling that
instruction. From over 2B instructions I was disassembling today, just this
one failed, so this could be a bug in LLVM's core disassembler.
Differential Revision: https://reviews.llvm.org/D129588
2022-07-12 12:41:30 -07:00
|
|
|
m_j.attribute("mnemonic",
|
[trace] Add instruction control flow kind to JSON trace dumper's output
D128477 adds a '-k' flag which displays each instruction's control flow in the `thread trace dump instructions` command's non-json output (ie no '-j' or '-J' flag)
This diff adds the instruction control flow kind to the `thread trace dump instructions` command's JSON output (ie '-j' or '-J' flag)
Test Plan:
Confirm "controlFlowKind" is present in JSON when '-k' is provided
```
(lldb) thread trace dump instructions -J -k
[
{ [141/1952]
"id": 7755,
"loadAddress": "0x400868",
"module": "test.out",
"symbol": "main",
"mnemonic": "jmp",
"controlFlowKind": "jump",
"source": "/home/jakobjohnson/jakob-dev/test.cpp",
"line": 41,
"column": 29
},
{
"id": 7753,
"loadAddress": "0x7ffff7b54dab",
"module": "libstdc++.so.6",
"symbol": "std::ostream::flush()",
"mnemonic": "retq",
"controlFlowKind": "return"
},
{
"id": 7752,
"loadAddress": "0x7ffff7b54daa",
"module": "libstdc++.so.6",
"symbol": "std::ostream::flush()",
"mnemonic": "popq",
"controlFlowKind": "other"
},
...
]
```
Confirm "controlFlowKind" is not present when '-k' isn't provided
```
(lldb) thread trace dump instructions -J
[
{
"id": 7755,
"loadAddress": "0x400868",
"module": "test.out",
"symbol": "main",
"mnemonic": "jmp",
"source": "/home/jakobjohnson/jakob-dev/test.cpp",
"line": 41,
"column": 29
},
{
"id": 7753,
"loadAddress": "0x7ffff7b54dab",
"module": "libstdc++.so.6",
"symbol": "std::ostream::flush()",
"mnemonic": "retq"
},
{
"id": 7752,
"loadAddress": "0x7ffff7b54daa",
"module": "libstdc++.so.6",
"symbol": "std::ostream::flush()",
"mnemonic": "popq"
},
```
Differential Revision: https://reviews.llvm.org/D130607
2022-07-26 12:19:54 -07:00
|
|
|
ToOptionalString(instruction->GetMnemonic(&exe_ctx)));
|
|
|
|
if (m_options.show_control_flow_kind) {
|
|
|
|
lldb::InstructionControlFlowKind instruction_control_flow_kind =
|
|
|
|
instruction->GetControlFlowKind(&exe_ctx);
|
|
|
|
m_j.attribute("controlFlowKind",
|
|
|
|
ToOptionalString(
|
|
|
|
Instruction::GetNameForInstructionControlFlowKind(
|
|
|
|
instruction_control_flow_kind)));
|
|
|
|
}
|
[trace] Avoid a crash in the dumper when disassembling fails
In rare situations, disassemblying would fail that produce an invalid
InstructionSP object. We need to check that it's valid before using.
With this change, now the dumper doesn't crash with dumping instructions of
ioctl. In fact, it now dumps this output
{
"id": 6135,
"loadAddress": "0x7f4bfe5c7515",
"module": "libc.so.6",
"symbol": "ioctl",
"source": "glibc/2.34/src/glibc-2.34/sysdeps/unix/syscall-template.S",
"line": 120,
"column": 0
}
Anyway, we need to investigate why the diassembler failed disassembling that
instruction. From over 2B instructions I was disassembling today, just this
one failed, so this could be a bug in LLVM's core disassembler.
Differential Revision: https://reviews.llvm.org/D129588
2022-07-12 12:41:30 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
if (IsLineEntryValid(item.symbol_info->sc.line_entry)) {
|
|
|
|
m_j.attribute(
|
|
|
|
"source",
|
|
|
|
ToOptionalString(
|
|
|
|
item.symbol_info->sc.line_entry.file.GetPath().c_str()));
|
|
|
|
m_j.attribute("line", item.symbol_info->sc.line_entry.line);
|
|
|
|
m_j.attribute("column", item.symbol_info->sc.line_entry.column);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
void TraceItem(const TraceDumper::TraceItem &item) override {
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
m_j.object([&] {
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
m_j.attribute("id", item.id);
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
if (m_options.show_timestamps)
|
|
|
|
m_j.attribute("timestamp_ns", item.timestamp
|
2022-12-16 07:56:52 +00:00
|
|
|
? std::optional<std::string>(
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
std::to_string(*item.timestamp))
|
2022-12-04 16:51:25 -08:00
|
|
|
: std::nullopt);
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
if (item.event) {
|
[trace] Avoid a crash in the dumper when disassembling fails
In rare situations, disassemblying would fail that produce an invalid
InstructionSP object. We need to check that it's valid before using.
With this change, now the dumper doesn't crash with dumping instructions of
ioctl. In fact, it now dumps this output
{
"id": 6135,
"loadAddress": "0x7f4bfe5c7515",
"module": "libc.so.6",
"symbol": "ioctl",
"source": "glibc/2.34/src/glibc-2.34/sysdeps/unix/syscall-template.S",
"line": 120,
"column": 0
}
Anyway, we need to investigate why the diassembler failed disassembling that
instruction. From over 2B instructions I was disassembling today, just this
one failed, so this could be a bug in LLVM's core disassembler.
Differential Revision: https://reviews.llvm.org/D129588
2022-07-12 12:41:30 -07:00
|
|
|
DumpEvent(item);
|
[trace][intel pt] Create a CPU change event and expose it in the dumper
Thanks to fredzhou@fb.com for coming up with this feature.
When tracing in per-cpu mode, we have information of in which cpu we are execution each instruction, which comes from the context switch trace. This diff makes this information available as a `cpu changed event`, which an additional accessor in the cursor `GetCPU()`. As cpu changes are very infrequent, any consumer should listen to cpu change events instead of querying the actual cpu of a trace item. Once a cpu change event is seen, the consumer can invoke GetCPU() to get that information. Also, it's possible to invoke GetCPU() on an arbitrary instruction item, which will return the last cpu seen. However, this call is O(logn) and should be used sparingly.
Manually tested with a sample program that starts on cpu 52, then goes to 18, and then goes back to 52.
Differential Revision: https://reviews.llvm.org/D129340
2022-07-07 18:31:18 -07:00
|
|
|
} else if (item.error) {
|
|
|
|
m_j.attribute("error", *item.error);
|
|
|
|
} else {
|
|
|
|
DumpInstruction(item);
|
|
|
|
}
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
});
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
Stream &m_s;
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
TraceDumperOptions m_options;
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
json::OStream m_j;
|
|
|
|
};
|
|
|
|
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
static std::unique_ptr<TraceDumper::OutputWriter>
|
|
|
|
CreateWriter(Stream &s, const TraceDumperOptions &options, Thread &thread) {
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
if (options.json)
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
return std::unique_ptr<TraceDumper::OutputWriter>(
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
new OutputWriterJSON(s, options));
|
|
|
|
else
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
return std::unique_ptr<TraceDumper::OutputWriter>(
|
|
|
|
new OutputWriterCLI(s, options, thread));
|
2021-06-29 14:03:30 -07:00
|
|
|
}
|
|
|
|
|
2022-08-01 07:34:47 -07:00
|
|
|
TraceDumper::TraceDumper(lldb::TraceCursorSP cursor_sp, Stream &s,
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
const TraceDumperOptions &options)
|
2022-08-01 07:34:47 -07:00
|
|
|
: m_cursor_sp(std::move(cursor_sp)), m_options(options),
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
m_writer_up(CreateWriter(
|
2022-08-01 07:34:47 -07:00
|
|
|
s, m_options, *m_cursor_sp->GetExecutionContextRef().GetThreadSP())) {
|
2021-06-29 14:03:30 -07:00
|
|
|
|
2022-06-24 11:19:09 -07:00
|
|
|
if (m_options.id)
|
2022-08-01 07:34:47 -07:00
|
|
|
m_cursor_sp->GoToId(*m_options.id);
|
2022-06-24 11:19:09 -07:00
|
|
|
else if (m_options.forwards)
|
2022-08-01 12:23:22 -07:00
|
|
|
m_cursor_sp->Seek(0, lldb::eTraceCursorSeekTypeBeginning);
|
2022-06-24 11:19:09 -07:00
|
|
|
else
|
2022-08-01 12:23:22 -07:00
|
|
|
m_cursor_sp->Seek(0, lldb::eTraceCursorSeekTypeEnd);
|
2021-06-29 14:03:30 -07:00
|
|
|
|
2022-08-01 07:34:47 -07:00
|
|
|
m_cursor_sp->SetForwards(m_options.forwards);
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
if (m_options.skip) {
|
2022-08-01 07:34:47 -07:00
|
|
|
m_cursor_sp->Seek((m_options.forwards ? 1 : -1) * *m_options.skip,
|
2022-08-01 12:23:22 -07:00
|
|
|
lldb::eTraceCursorSeekTypeCurrent);
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
}
|
2022-04-19 21:57:31 -07:00
|
|
|
}
|
|
|
|
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
TraceDumper::TraceItem TraceDumper::CreatRawTraceItem() {
|
[LLDB][NFC][Reliability] Fix uninitialized variables from Coverity scan. Part 2
Improve LLDB reliability by fixing the following "uninitialized variables" static code inspection warnings from
scan.coverity.com:
1476275, 1274012, 1455035, 1364789, 1454282
1467483, 1406152, 1406255, 1454837, 1454416
1467446, 1462022, 1461909, 1420566, 1327228
1367767, 1431254, 1467299, 1312678, 1431780
1454731, 1490403
Differential Revision: https://reviews.llvm.org/D130528
2022-07-25 15:52:21 -07:00
|
|
|
TraceItem item = {};
|
2022-08-01 07:34:47 -07:00
|
|
|
item.id = m_cursor_sp->GetId();
|
2022-04-19 21:57:31 -07:00
|
|
|
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
if (m_options.show_timestamps)
|
2022-08-01 07:34:47 -07:00
|
|
|
item.timestamp = m_cursor_sp->GetWallClockTime();
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
return item;
|
[trace][intel pt] Support events
A trace might contain events traced during the target's execution. For
example, a thread might be paused for some period of time due to context
switches or breakpoints, which actually force a context switch. Not only
that, a trace might be paused because the CPU decides to trace only a
specific part of the target, like the address filtering provided by
intel pt, which will cause pause events. Besides this case, other kinds
of events might exist.
This patch adds the method `TraceCursor::GetEvents()`` that returns the
list of events that happened right before the instruction being pointed
at by the cursor. Some refactors were done to make this change simpler.
Besides this new API, the instruction dumper now supports the -e flag
which shows pause events, like in the following example, where pauses
happened due to breakpoints.
```
thread #1: tid = 2717361
a.out`main + 20 at main.cpp:27:20
0: 0x00000000004023d9 leaq -0x1200(%rbp), %rax
[paused]
1: 0x00000000004023e0 movq %rax, %rdi
[paused]
2: 0x00000000004023e3 callq 0x403a62 ; std::vector<int, std::allocator<int> >::vector at stl_vector.h:391:7
a.out`std::vector<int, std::allocator<int> >::vector() at stl_vector.h:391:7
3: 0x0000000000403a62 pushq %rbp
4: 0x0000000000403a63 movq %rsp, %rbp
```
The `dump info` command has also been updated and now it shows the
number of instructions that have associated events.
Differential Revision: https://reviews.llvm.org/D123982
2022-04-18 09:28:10 -07:00
|
|
|
}
|
|
|
|
|
2022-04-19 21:57:31 -07:00
|
|
|
/// Find the symbol context for the given address reusing the previous
|
|
|
|
/// instruction's symbol context when possible.
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
static SymbolContext
|
|
|
|
CalculateSymbolContext(const Address &address,
|
2022-10-10 12:57:13 -07:00
|
|
|
const SymbolContext &prev_symbol_context) {
|
2023-11-11 11:21:32 -08:00
|
|
|
lldb_private::AddressRange range;
|
2022-10-10 12:57:13 -07:00
|
|
|
if (prev_symbol_context.GetAddressRange(eSymbolContextEverything, 0,
|
|
|
|
/*inline_block_range*/ true, range) &&
|
2022-04-19 21:57:31 -07:00
|
|
|
range.Contains(address))
|
2022-10-10 12:57:13 -07:00
|
|
|
return prev_symbol_context;
|
2022-04-19 21:57:31 -07:00
|
|
|
|
|
|
|
SymbolContext sc;
|
|
|
|
address.CalculateSymbolContext(&sc, eSymbolContextEverything);
|
|
|
|
return sc;
|
|
|
|
}
|
|
|
|
|
|
|
|
/// Find the disassembler for the given address reusing the previous
|
|
|
|
/// instruction's disassembler when possible.
|
|
|
|
static std::tuple<DisassemblerSP, InstructionSP>
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
CalculateDisass(const TraceDumper::SymbolInfo &symbol_info,
|
|
|
|
const TraceDumper::SymbolInfo &prev_symbol_info,
|
2022-04-19 21:57:31 -07:00
|
|
|
const ExecutionContext &exe_ctx) {
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
if (prev_symbol_info.disassembler) {
|
2022-04-19 21:57:31 -07:00
|
|
|
if (InstructionSP instruction =
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
prev_symbol_info.disassembler->GetInstructionList()
|
|
|
|
.GetInstructionAtAddress(symbol_info.address))
|
|
|
|
return std::make_tuple(prev_symbol_info.disassembler, instruction);
|
2022-04-19 21:57:31 -07:00
|
|
|
}
|
|
|
|
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
if (symbol_info.sc.function) {
|
2022-04-19 21:57:31 -07:00
|
|
|
if (DisassemblerSP disassembler =
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
symbol_info.sc.function->GetInstructions(exe_ctx, nullptr)) {
|
2022-04-19 21:57:31 -07:00
|
|
|
if (InstructionSP instruction =
|
|
|
|
disassembler->GetInstructionList().GetInstructionAtAddress(
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
symbol_info.address))
|
2022-04-19 21:57:31 -07:00
|
|
|
return std::make_tuple(disassembler, instruction);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
// We fallback to a single instruction disassembler
|
|
|
|
Target &target = exe_ctx.GetTargetRef();
|
|
|
|
const ArchSpec arch = target.GetArchitecture();
|
2023-11-11 11:21:32 -08:00
|
|
|
lldb_private::AddressRange range(symbol_info.address,
|
|
|
|
arch.GetMaximumOpcodeByteSize());
|
2022-04-19 21:57:31 -07:00
|
|
|
DisassemblerSP disassembler =
|
|
|
|
Disassembler::DisassembleRange(arch, /*plugin_name*/ nullptr,
|
|
|
|
/*flavor*/ nullptr, target, range);
|
|
|
|
return std::make_tuple(
|
|
|
|
disassembler,
|
|
|
|
disassembler ? disassembler->GetInstructionList().GetInstructionAtAddress(
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
symbol_info.address)
|
2022-04-19 21:57:31 -07:00
|
|
|
: InstructionSP());
|
|
|
|
}
|
|
|
|
|
2022-10-10 12:57:13 -07:00
|
|
|
static TraceDumper::SymbolInfo
|
|
|
|
CalculateSymbolInfo(const ExecutionContext &exe_ctx, lldb::addr_t load_address,
|
|
|
|
const TraceDumper::SymbolInfo &prev_symbol_info) {
|
|
|
|
TraceDumper::SymbolInfo symbol_info;
|
|
|
|
symbol_info.exe_ctx = exe_ctx;
|
|
|
|
symbol_info.address.SetLoadAddress(load_address, exe_ctx.GetTargetPtr());
|
|
|
|
symbol_info.sc =
|
|
|
|
CalculateSymbolContext(symbol_info.address, prev_symbol_info.sc);
|
|
|
|
std::tie(symbol_info.disassembler, symbol_info.instruction) =
|
|
|
|
CalculateDisass(symbol_info, prev_symbol_info, exe_ctx);
|
|
|
|
return symbol_info;
|
|
|
|
}
|
|
|
|
|
2023-01-07 14:18:35 -08:00
|
|
|
std::optional<lldb::user_id_t> TraceDumper::DumpInstructions(size_t count) {
|
2022-08-01 07:34:47 -07:00
|
|
|
ThreadSP thread_sp = m_cursor_sp->GetExecutionContextRef().GetThreadSP();
|
2021-06-29 14:03:30 -07:00
|
|
|
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
SymbolInfo prev_symbol_info;
|
2023-01-07 14:18:35 -08:00
|
|
|
std::optional<lldb::user_id_t> last_id;
|
2021-06-29 14:03:30 -07:00
|
|
|
|
|
|
|
ExecutionContext exe_ctx;
|
2022-04-19 21:57:31 -07:00
|
|
|
thread_sp->GetProcess()->GetTarget().CalculateExecutionContext(exe_ctx);
|
2021-06-29 14:03:30 -07:00
|
|
|
|
2022-08-01 07:34:47 -07:00
|
|
|
for (size_t insn_seen = 0; insn_seen < count && m_cursor_sp->HasValue();
|
|
|
|
m_cursor_sp->Next()) {
|
2021-06-29 14:03:30 -07:00
|
|
|
|
2022-08-01 07:34:47 -07:00
|
|
|
last_id = m_cursor_sp->GetId();
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
TraceItem item = CreatRawTraceItem();
|
|
|
|
|
[trace][intel pt] Fix per-psb packet decoding
The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is not true. It returns the offset of the PSB packet after invoking pt_insn_sync_forward regardless of how many PSBs are visited later. Instead, I'm now following the approach described in https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode for parallel decoding, which is basically what we need.
A nasty error that happened because of this is that when we had two PSBs (A and B), the following was happening
1. PSB A was processed all the way up to the end of the trace, which includes PSB B.
2. PSB B was then processed until the end of the trace.
The instructions emitted by step 2. were also emitted as part of step 1. so our trace had duplicated chunks. This problem becomes worse when you many PSBs.
As part of making sure this diff is correct, I added some other features that are very useful.
- Added a "synchronization point" event to the TraceCursor, so we can inspect when PSBs are emitted.
- Removed the single-thread decoder. Now the per-cpu decoder and single-thread decoder use the same code paths.
- Use the query decoder to fetch PSBs and timestamps. It turns out that the pt_insn_sync_forward of the instruction decoder can move past several PSBs (this means that we could skip some TSCs). On the other hand, the pt_query_sync_forward method doesn't skip PSBs, so we can get more accurate sync events and timing information.
- Turned LibiptDecoder into PSBBlockDecoder, which decodes single PSB blocks. It is the fundamental processing unit for decoding.
- Added many comments, asserts and improved error handling for clarity.
- Improved DecodeSystemWideTraceForThread so that a TSC is emitted always before a cpu change event. This was a bug that was annoying me before.
- SplitTraceInContinuousExecutions and FindLowestTSCInTrace are now using the query decoder, which can identify precisely each PSB along with their TSCs.
- Added an "only-events" option to the trace dumper to inspect only events.
I did extensive testing and I think we should have an in-house testing CI. The LLVM buildbots are not capable of supporting testing post-mortem traces of hundreds of megabytes. I'll leave that for later, but at least for now the current tests were able to catch most of the issues I encountered when doing this task.
A sample output of a program that I was single stepping is the following. You can see that only one PSB is emitted even though stepping happened!
```
thread #1: tid = 3578223
0: (event) trace synchronization point [offset = 0x0xef0]
a.out`main + 20 at main.cpp:29:20
1: 0x0000000000402479 leaq -0x1210(%rbp), %rax
2: (event) software disabled tracing
3: 0x0000000000402480 movq %rax, %rdi
4: (event) software disabled tracing
5: (event) software disabled tracing
6: 0x0000000000402483 callq 0x403bd4 ; std::vector<int, std::allocator<int>>::vector at stl_vector.h:391:7
7: (event) software disabled tracing
a.out`std::vector<int, std::allocator<int>>::vector() at stl_vector.h:391:7
8: 0x0000000000403bd4 pushq %rbp
9: (event) software disabled tracing
10: 0x0000000000403bd5 movq %rsp, %rbp
11: (event) software disabled tracing
```
This is another trace of a long program with a few PSBs.
```
(lldb) thread trace dump instructions -E -f thread #1: tid = 3603082
0: (event) trace synchronization point [offset = 0x0x80]
47417: (event) software disabled tracing
129231: (event) trace synchronization point [offset = 0x0x800]
146747: (event) software disabled tracing
246076: (event) software disabled tracing
259068: (event) trace synchronization point [offset = 0x0xf78]
259276: (event) software disabled tracing
259278: (event) software disabled tracing
no more data
```
Differential Revision: https://reviews.llvm.org/D131630
2022-08-08 16:24:22 -07:00
|
|
|
if (m_cursor_sp->IsEvent() && m_options.show_events) {
|
2022-08-01 07:34:47 -07:00
|
|
|
item.event = m_cursor_sp->GetEventType();
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
switch (*item.event) {
|
|
|
|
case eTraceEventCPUChanged:
|
2022-08-01 07:34:47 -07:00
|
|
|
item.cpu_id = m_cursor_sp->GetCPU();
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
break;
|
|
|
|
case eTraceEventHWClockTick:
|
2022-08-01 07:34:47 -07:00
|
|
|
item.hw_clock = m_cursor_sp->GetHWClock();
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
break;
|
|
|
|
case eTraceEventDisabledHW:
|
|
|
|
case eTraceEventDisabledSW:
|
|
|
|
break;
|
[trace][intel pt] Fix per-psb packet decoding
The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is not true. It returns the offset of the PSB packet after invoking pt_insn_sync_forward regardless of how many PSBs are visited later. Instead, I'm now following the approach described in https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode for parallel decoding, which is basically what we need.
A nasty error that happened because of this is that when we had two PSBs (A and B), the following was happening
1. PSB A was processed all the way up to the end of the trace, which includes PSB B.
2. PSB B was then processed until the end of the trace.
The instructions emitted by step 2. were also emitted as part of step 1. so our trace had duplicated chunks. This problem becomes worse when you many PSBs.
As part of making sure this diff is correct, I added some other features that are very useful.
- Added a "synchronization point" event to the TraceCursor, so we can inspect when PSBs are emitted.
- Removed the single-thread decoder. Now the per-cpu decoder and single-thread decoder use the same code paths.
- Use the query decoder to fetch PSBs and timestamps. It turns out that the pt_insn_sync_forward of the instruction decoder can move past several PSBs (this means that we could skip some TSCs). On the other hand, the pt_query_sync_forward method doesn't skip PSBs, so we can get more accurate sync events and timing information.
- Turned LibiptDecoder into PSBBlockDecoder, which decodes single PSB blocks. It is the fundamental processing unit for decoding.
- Added many comments, asserts and improved error handling for clarity.
- Improved DecodeSystemWideTraceForThread so that a TSC is emitted always before a cpu change event. This was a bug that was annoying me before.
- SplitTraceInContinuousExecutions and FindLowestTSCInTrace are now using the query decoder, which can identify precisely each PSB along with their TSCs.
- Added an "only-events" option to the trace dumper to inspect only events.
I did extensive testing and I think we should have an in-house testing CI. The LLVM buildbots are not capable of supporting testing post-mortem traces of hundreds of megabytes. I'll leave that for later, but at least for now the current tests were able to catch most of the issues I encountered when doing this task.
A sample output of a program that I was single stepping is the following. You can see that only one PSB is emitted even though stepping happened!
```
thread #1: tid = 3578223
0: (event) trace synchronization point [offset = 0x0xef0]
a.out`main + 20 at main.cpp:29:20
1: 0x0000000000402479 leaq -0x1210(%rbp), %rax
2: (event) software disabled tracing
3: 0x0000000000402480 movq %rax, %rdi
4: (event) software disabled tracing
5: (event) software disabled tracing
6: 0x0000000000402483 callq 0x403bd4 ; std::vector<int, std::allocator<int>>::vector at stl_vector.h:391:7
7: (event) software disabled tracing
a.out`std::vector<int, std::allocator<int>>::vector() at stl_vector.h:391:7
8: 0x0000000000403bd4 pushq %rbp
9: (event) software disabled tracing
10: 0x0000000000403bd5 movq %rsp, %rbp
11: (event) software disabled tracing
```
This is another trace of a long program with a few PSBs.
```
(lldb) thread trace dump instructions -E -f thread #1: tid = 3603082
0: (event) trace synchronization point [offset = 0x0x80]
47417: (event) software disabled tracing
129231: (event) trace synchronization point [offset = 0x0x800]
146747: (event) software disabled tracing
246076: (event) software disabled tracing
259068: (event) trace synchronization point [offset = 0x0xf78]
259276: (event) software disabled tracing
259278: (event) software disabled tracing
no more data
```
Differential Revision: https://reviews.llvm.org/D131630
2022-08-08 16:24:22 -07:00
|
|
|
case eTraceEventSyncPoint:
|
|
|
|
item.sync_point_metadata = m_cursor_sp->GetSyncPointMetadata();
|
|
|
|
break;
|
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```
Differential Revision: https://reviews.llvm.org/D130054
2022-07-18 16:56:01 -07:00
|
|
|
}
|
[trace][intel pt] Fix per-psb packet decoding
The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is not true. It returns the offset of the PSB packet after invoking pt_insn_sync_forward regardless of how many PSBs are visited later. Instead, I'm now following the approach described in https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode for parallel decoding, which is basically what we need.
A nasty error that happened because of this is that when we had two PSBs (A and B), the following was happening
1. PSB A was processed all the way up to the end of the trace, which includes PSB B.
2. PSB B was then processed until the end of the trace.
The instructions emitted by step 2. were also emitted as part of step 1. so our trace had duplicated chunks. This problem becomes worse when you many PSBs.
As part of making sure this diff is correct, I added some other features that are very useful.
- Added a "synchronization point" event to the TraceCursor, so we can inspect when PSBs are emitted.
- Removed the single-thread decoder. Now the per-cpu decoder and single-thread decoder use the same code paths.
- Use the query decoder to fetch PSBs and timestamps. It turns out that the pt_insn_sync_forward of the instruction decoder can move past several PSBs (this means that we could skip some TSCs). On the other hand, the pt_query_sync_forward method doesn't skip PSBs, so we can get more accurate sync events and timing information.
- Turned LibiptDecoder into PSBBlockDecoder, which decodes single PSB blocks. It is the fundamental processing unit for decoding.
- Added many comments, asserts and improved error handling for clarity.
- Improved DecodeSystemWideTraceForThread so that a TSC is emitted always before a cpu change event. This was a bug that was annoying me before.
- SplitTraceInContinuousExecutions and FindLowestTSCInTrace are now using the query decoder, which can identify precisely each PSB along with their TSCs.
- Added an "only-events" option to the trace dumper to inspect only events.
I did extensive testing and I think we should have an in-house testing CI. The LLVM buildbots are not capable of supporting testing post-mortem traces of hundreds of megabytes. I'll leave that for later, but at least for now the current tests were able to catch most of the issues I encountered when doing this task.
A sample output of a program that I was single stepping is the following. You can see that only one PSB is emitted even though stepping happened!
```
thread #1: tid = 3578223
0: (event) trace synchronization point [offset = 0x0xef0]
a.out`main + 20 at main.cpp:29:20
1: 0x0000000000402479 leaq -0x1210(%rbp), %rax
2: (event) software disabled tracing
3: 0x0000000000402480 movq %rax, %rdi
4: (event) software disabled tracing
5: (event) software disabled tracing
6: 0x0000000000402483 callq 0x403bd4 ; std::vector<int, std::allocator<int>>::vector at stl_vector.h:391:7
7: (event) software disabled tracing
a.out`std::vector<int, std::allocator<int>>::vector() at stl_vector.h:391:7
8: 0x0000000000403bd4 pushq %rbp
9: (event) software disabled tracing
10: 0x0000000000403bd5 movq %rsp, %rbp
11: (event) software disabled tracing
```
This is another trace of a long program with a few PSBs.
```
(lldb) thread trace dump instructions -E -f thread #1: tid = 3603082
0: (event) trace synchronization point [offset = 0x0x80]
47417: (event) software disabled tracing
129231: (event) trace synchronization point [offset = 0x0x800]
146747: (event) software disabled tracing
246076: (event) software disabled tracing
259068: (event) trace synchronization point [offset = 0x0xf78]
259276: (event) software disabled tracing
259278: (event) software disabled tracing
no more data
```
Differential Revision: https://reviews.llvm.org/D131630
2022-08-08 16:24:22 -07:00
|
|
|
m_writer_up->TraceItem(item);
|
2022-08-01 07:34:47 -07:00
|
|
|
} else if (m_cursor_sp->IsError()) {
|
|
|
|
item.error = m_cursor_sp->GetError();
|
[trace][intel pt] Fix per-psb packet decoding
The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is not true. It returns the offset of the PSB packet after invoking pt_insn_sync_forward regardless of how many PSBs are visited later. Instead, I'm now following the approach described in https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode for parallel decoding, which is basically what we need.
A nasty error that happened because of this is that when we had two PSBs (A and B), the following was happening
1. PSB A was processed all the way up to the end of the trace, which includes PSB B.
2. PSB B was then processed until the end of the trace.
The instructions emitted by step 2. were also emitted as part of step 1. so our trace had duplicated chunks. This problem becomes worse when you many PSBs.
As part of making sure this diff is correct, I added some other features that are very useful.
- Added a "synchronization point" event to the TraceCursor, so we can inspect when PSBs are emitted.
- Removed the single-thread decoder. Now the per-cpu decoder and single-thread decoder use the same code paths.
- Use the query decoder to fetch PSBs and timestamps. It turns out that the pt_insn_sync_forward of the instruction decoder can move past several PSBs (this means that we could skip some TSCs). On the other hand, the pt_query_sync_forward method doesn't skip PSBs, so we can get more accurate sync events and timing information.
- Turned LibiptDecoder into PSBBlockDecoder, which decodes single PSB blocks. It is the fundamental processing unit for decoding.
- Added many comments, asserts and improved error handling for clarity.
- Improved DecodeSystemWideTraceForThread so that a TSC is emitted always before a cpu change event. This was a bug that was annoying me before.
- SplitTraceInContinuousExecutions and FindLowestTSCInTrace are now using the query decoder, which can identify precisely each PSB along with their TSCs.
- Added an "only-events" option to the trace dumper to inspect only events.
I did extensive testing and I think we should have an in-house testing CI. The LLVM buildbots are not capable of supporting testing post-mortem traces of hundreds of megabytes. I'll leave that for later, but at least for now the current tests were able to catch most of the issues I encountered when doing this task.
A sample output of a program that I was single stepping is the following. You can see that only one PSB is emitted even though stepping happened!
```
thread #1: tid = 3578223
0: (event) trace synchronization point [offset = 0x0xef0]
a.out`main + 20 at main.cpp:29:20
1: 0x0000000000402479 leaq -0x1210(%rbp), %rax
2: (event) software disabled tracing
3: 0x0000000000402480 movq %rax, %rdi
4: (event) software disabled tracing
5: (event) software disabled tracing
6: 0x0000000000402483 callq 0x403bd4 ; std::vector<int, std::allocator<int>>::vector at stl_vector.h:391:7
7: (event) software disabled tracing
a.out`std::vector<int, std::allocator<int>>::vector() at stl_vector.h:391:7
8: 0x0000000000403bd4 pushq %rbp
9: (event) software disabled tracing
10: 0x0000000000403bd5 movq %rsp, %rbp
11: (event) software disabled tracing
```
This is another trace of a long program with a few PSBs.
```
(lldb) thread trace dump instructions -E -f thread #1: tid = 3603082
0: (event) trace synchronization point [offset = 0x0x80]
47417: (event) software disabled tracing
129231: (event) trace synchronization point [offset = 0x0x800]
146747: (event) software disabled tracing
246076: (event) software disabled tracing
259068: (event) trace synchronization point [offset = 0x0xf78]
259276: (event) software disabled tracing
259278: (event) software disabled tracing
no more data
```
Differential Revision: https://reviews.llvm.org/D131630
2022-08-08 16:24:22 -07:00
|
|
|
m_writer_up->TraceItem(item);
|
|
|
|
} else if (m_cursor_sp->IsInstruction() && !m_options.only_events) {
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
insn_seen++;
|
2022-08-01 07:34:47 -07:00
|
|
|
item.load_address = m_cursor_sp->GetLoadAddress();
|
2021-06-29 14:03:30 -07:00
|
|
|
|
2022-03-22 09:18:53 -07:00
|
|
|
if (!m_options.raw) {
|
2022-10-10 12:57:13 -07:00
|
|
|
SymbolInfo symbol_info =
|
|
|
|
CalculateSymbolInfo(exe_ctx, item.load_address, prev_symbol_info);
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
item.prev_symbol_info = prev_symbol_info;
|
|
|
|
item.symbol_info = symbol_info;
|
[trace] Add an option to dump instructions in json and to a file
In order to provide simple scripting support on top of instruction traces, a simple solution is to enhance the `dump instructions` command and allow printing in json and directly to a file. The format is verbose and not space efficient, but it's not supposed to be used for really large traces, in which case the TraceCursor API is the way to go.
- add a -j option for printing the dump in json
- add a -J option for pretty printing the json output
- add a -F option for specifying an output file
- add a -a option for dumping all the instructions available starting at the initial point configured with the other flags
- add tests for all cases
- refactored the instruction dumper and abstracted the actual "printing" logic. There are two writer implementations: CLI and JSON. This made the dumper itself much more readable and maintanable
sample output:
```
(lldb) thread trace dump instructions -t -a --id 100 -J
[
{
"id": 100,
"tsc": "43591204528448966"
"loadAddress": "0x407a91",
"module": "a.out",
"symbol": "void std::deque<Foo, std::allocator<Foo>>::_M_push_back_aux<Foo>(Foo&&)",
"mnemonic": "movq",
"source": "/usr/include/c++/8/bits/deque.tcc",
"line": 492,
"column": 30
},
...
```
Differential Revision: https://reviews.llvm.org/D128316
2022-06-18 15:44:37 -07:00
|
|
|
prev_symbol_info = symbol_info;
|
2021-06-29 14:03:30 -07:00
|
|
|
}
|
[trace][intel pt] Fix per-psb packet decoding
The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is not true. It returns the offset of the PSB packet after invoking pt_insn_sync_forward regardless of how many PSBs are visited later. Instead, I'm now following the approach described in https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode for parallel decoding, which is basically what we need.
A nasty error that happened because of this is that when we had two PSBs (A and B), the following was happening
1. PSB A was processed all the way up to the end of the trace, which includes PSB B.
2. PSB B was then processed until the end of the trace.
The instructions emitted by step 2. were also emitted as part of step 1. so our trace had duplicated chunks. This problem becomes worse when you many PSBs.
As part of making sure this diff is correct, I added some other features that are very useful.
- Added a "synchronization point" event to the TraceCursor, so we can inspect when PSBs are emitted.
- Removed the single-thread decoder. Now the per-cpu decoder and single-thread decoder use the same code paths.
- Use the query decoder to fetch PSBs and timestamps. It turns out that the pt_insn_sync_forward of the instruction decoder can move past several PSBs (this means that we could skip some TSCs). On the other hand, the pt_query_sync_forward method doesn't skip PSBs, so we can get more accurate sync events and timing information.
- Turned LibiptDecoder into PSBBlockDecoder, which decodes single PSB blocks. It is the fundamental processing unit for decoding.
- Added many comments, asserts and improved error handling for clarity.
- Improved DecodeSystemWideTraceForThread so that a TSC is emitted always before a cpu change event. This was a bug that was annoying me before.
- SplitTraceInContinuousExecutions and FindLowestTSCInTrace are now using the query decoder, which can identify precisely each PSB along with their TSCs.
- Added an "only-events" option to the trace dumper to inspect only events.
I did extensive testing and I think we should have an in-house testing CI. The LLVM buildbots are not capable of supporting testing post-mortem traces of hundreds of megabytes. I'll leave that for later, but at least for now the current tests were able to catch most of the issues I encountered when doing this task.
A sample output of a program that I was single stepping is the following. You can see that only one PSB is emitted even though stepping happened!
```
thread #1: tid = 3578223
0: (event) trace synchronization point [offset = 0x0xef0]
a.out`main + 20 at main.cpp:29:20
1: 0x0000000000402479 leaq -0x1210(%rbp), %rax
2: (event) software disabled tracing
3: 0x0000000000402480 movq %rax, %rdi
4: (event) software disabled tracing
5: (event) software disabled tracing
6: 0x0000000000402483 callq 0x403bd4 ; std::vector<int, std::allocator<int>>::vector at stl_vector.h:391:7
7: (event) software disabled tracing
a.out`std::vector<int, std::allocator<int>>::vector() at stl_vector.h:391:7
8: 0x0000000000403bd4 pushq %rbp
9: (event) software disabled tracing
10: 0x0000000000403bd5 movq %rsp, %rbp
11: (event) software disabled tracing
```
This is another trace of a long program with a few PSBs.
```
(lldb) thread trace dump instructions -E -f thread #1: tid = 3603082
0: (event) trace synchronization point [offset = 0x0x80]
47417: (event) software disabled tracing
129231: (event) trace synchronization point [offset = 0x0x800]
146747: (event) software disabled tracing
246076: (event) software disabled tracing
259068: (event) trace synchronization point [offset = 0x0xf78]
259276: (event) software disabled tracing
259278: (event) software disabled tracing
no more data
```
Differential Revision: https://reviews.llvm.org/D131630
2022-08-08 16:24:22 -07:00
|
|
|
m_writer_up->TraceItem(item);
|
[trace][intel pt] Support events
A trace might contain events traced during the target's execution. For
example, a thread might be paused for some period of time due to context
switches or breakpoints, which actually force a context switch. Not only
that, a trace might be paused because the CPU decides to trace only a
specific part of the target, like the address filtering provided by
intel pt, which will cause pause events. Besides this case, other kinds
of events might exist.
This patch adds the method `TraceCursor::GetEvents()`` that returns the
list of events that happened right before the instruction being pointed
at by the cursor. Some refactors were done to make this change simpler.
Besides this new API, the instruction dumper now supports the -e flag
which shows pause events, like in the following example, where pauses
happened due to breakpoints.
```
thread #1: tid = 2717361
a.out`main + 20 at main.cpp:27:20
0: 0x00000000004023d9 leaq -0x1200(%rbp), %rax
[paused]
1: 0x00000000004023e0 movq %rax, %rdi
[paused]
2: 0x00000000004023e3 callq 0x403a62 ; std::vector<int, std::allocator<int> >::vector at stl_vector.h:391:7
a.out`std::vector<int, std::allocator<int> >::vector() at stl_vector.h:391:7
3: 0x0000000000403a62 pushq %rbp
4: 0x0000000000403a63 movq %rsp, %rbp
```
The `dump info` command has also been updated and now it shows the
number of instructions that have associated events.
Differential Revision: https://reviews.llvm.org/D123982
2022-04-18 09:28:10 -07:00
|
|
|
}
|
2021-06-29 14:03:30 -07:00
|
|
|
}
|
2022-08-01 07:34:47 -07:00
|
|
|
if (!m_cursor_sp->HasValue())
|
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
2022-06-24 14:02:47 -07:00
|
|
|
m_writer_up->NoMoreData();
|
2022-03-22 09:18:53 -07:00
|
|
|
return last_id;
|
2021-06-29 14:03:30 -07:00
|
|
|
}
|
2022-10-10 12:57:13 -07:00
|
|
|
|
|
|
|
void TraceDumper::FunctionCall::TracedSegment::AppendInsn(
|
|
|
|
const TraceCursorSP &cursor_sp,
|
|
|
|
const TraceDumper::SymbolInfo &symbol_info) {
|
|
|
|
m_last_insn_id = cursor_sp->GetId();
|
|
|
|
m_last_symbol_info = symbol_info;
|
|
|
|
}
|
|
|
|
|
|
|
|
lldb::user_id_t
|
|
|
|
TraceDumper::FunctionCall::TracedSegment::GetFirstInstructionID() const {
|
|
|
|
return m_first_insn_id;
|
|
|
|
}
|
|
|
|
|
|
|
|
lldb::user_id_t
|
|
|
|
TraceDumper::FunctionCall::TracedSegment::GetLastInstructionID() const {
|
|
|
|
return m_last_insn_id;
|
|
|
|
}
|
|
|
|
|
|
|
|
void TraceDumper::FunctionCall::TracedSegment::IfNestedCall(
|
|
|
|
std::function<void(const FunctionCall &function_call)> callback) const {
|
|
|
|
if (m_nested_call)
|
|
|
|
callback(*m_nested_call);
|
|
|
|
}
|
|
|
|
|
|
|
|
const TraceDumper::FunctionCall &
|
|
|
|
TraceDumper::FunctionCall::TracedSegment::GetOwningCall() const {
|
|
|
|
return m_owning_call;
|
|
|
|
}
|
|
|
|
|
|
|
|
TraceDumper::FunctionCall &
|
|
|
|
TraceDumper::FunctionCall::TracedSegment::CreateNestedCall(
|
|
|
|
const TraceCursorSP &cursor_sp,
|
|
|
|
const TraceDumper::SymbolInfo &symbol_info) {
|
|
|
|
m_nested_call = std::make_unique<FunctionCall>(cursor_sp, symbol_info);
|
|
|
|
m_nested_call->SetParentCall(m_owning_call);
|
|
|
|
return *m_nested_call;
|
|
|
|
}
|
|
|
|
|
|
|
|
const TraceDumper::SymbolInfo &
|
|
|
|
TraceDumper::FunctionCall::TracedSegment::GetFirstInstructionSymbolInfo()
|
|
|
|
const {
|
|
|
|
return m_first_symbol_info;
|
|
|
|
}
|
|
|
|
|
|
|
|
const TraceDumper::SymbolInfo &
|
|
|
|
TraceDumper::FunctionCall::TracedSegment::GetLastInstructionSymbolInfo() const {
|
|
|
|
return m_last_symbol_info;
|
|
|
|
}
|
|
|
|
|
|
|
|
const TraceDumper::FunctionCall &
|
2022-10-15 18:52:22 -07:00
|
|
|
TraceDumper::FunctionCall::UntracedPrefixSegment::GetNestedCall() const {
|
2022-10-10 12:57:13 -07:00
|
|
|
return *m_nested_call;
|
|
|
|
}
|
|
|
|
|
|
|
|
TraceDumper::FunctionCall::FunctionCall(
|
|
|
|
const TraceCursorSP &cursor_sp,
|
|
|
|
const TraceDumper::SymbolInfo &symbol_info) {
|
|
|
|
m_is_error = cursor_sp->IsError();
|
|
|
|
AppendSegment(cursor_sp, symbol_info);
|
|
|
|
}
|
|
|
|
|
|
|
|
void TraceDumper::FunctionCall::AppendSegment(
|
|
|
|
const TraceCursorSP &cursor_sp,
|
|
|
|
const TraceDumper::SymbolInfo &symbol_info) {
|
|
|
|
m_traced_segments.emplace_back(cursor_sp, symbol_info, *this);
|
|
|
|
}
|
|
|
|
|
|
|
|
const TraceDumper::SymbolInfo &
|
|
|
|
TraceDumper::FunctionCall::GetSymbolInfo() const {
|
|
|
|
return m_traced_segments.back().GetLastInstructionSymbolInfo();
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TraceDumper::FunctionCall::IsError() const { return m_is_error; }
|
|
|
|
|
|
|
|
const std::deque<TraceDumper::FunctionCall::TracedSegment> &
|
|
|
|
TraceDumper::FunctionCall::GetTracedSegments() const {
|
|
|
|
return m_traced_segments;
|
|
|
|
}
|
|
|
|
|
|
|
|
TraceDumper::FunctionCall::TracedSegment &
|
|
|
|
TraceDumper::FunctionCall::GetLastTracedSegment() {
|
|
|
|
return m_traced_segments.back();
|
|
|
|
}
|
|
|
|
|
2023-01-07 14:18:35 -08:00
|
|
|
const std::optional<TraceDumper::FunctionCall::UntracedPrefixSegment> &
|
2022-10-15 18:52:22 -07:00
|
|
|
TraceDumper::FunctionCall::GetUntracedPrefixSegment() const {
|
|
|
|
return m_untraced_prefix_segment;
|
2022-10-10 12:57:13 -07:00
|
|
|
}
|
|
|
|
|
2022-10-15 18:52:22 -07:00
|
|
|
void TraceDumper::FunctionCall::SetUntracedPrefixSegment(
|
2022-10-10 12:57:13 -07:00
|
|
|
TraceDumper::FunctionCallUP &&nested_call) {
|
2022-10-15 18:52:22 -07:00
|
|
|
m_untraced_prefix_segment.emplace(std::move(nested_call));
|
2022-10-10 12:57:13 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
TraceDumper::FunctionCall *TraceDumper::FunctionCall::GetParentCall() const {
|
|
|
|
return m_parent_call;
|
|
|
|
}
|
|
|
|
|
|
|
|
void TraceDumper::FunctionCall::SetParentCall(
|
|
|
|
TraceDumper::FunctionCall &parent_call) {
|
|
|
|
m_parent_call = &parent_call;
|
|
|
|
}
|
|
|
|
|
|
|
|
/// Given an instruction that happens after a return, find the ancestor function
|
|
|
|
/// call that owns it. If this ancestor doesn't exist, create a new ancestor and
|
|
|
|
/// make it the root of the tree.
|
|
|
|
///
|
|
|
|
/// \param[in] last_function_call
|
|
|
|
/// The function call that performs the return.
|
|
|
|
///
|
|
|
|
/// \param[in] symbol_info
|
|
|
|
/// The symbol information of the instruction after the return.
|
|
|
|
///
|
|
|
|
/// \param[in] cursor_sp
|
|
|
|
/// The cursor pointing to the instruction after the return.
|
|
|
|
///
|
|
|
|
/// \param[in,out] roots
|
|
|
|
/// The object owning the roots. It might be modified if a new root needs to
|
|
|
|
/// be created.
|
|
|
|
///
|
|
|
|
/// \return
|
|
|
|
/// A reference to the function call that owns the new instruction
|
|
|
|
static TraceDumper::FunctionCall &AppendReturnedInstructionToFunctionCallForest(
|
|
|
|
TraceDumper::FunctionCall &last_function_call,
|
|
|
|
const TraceDumper::SymbolInfo &symbol_info, const TraceCursorSP &cursor_sp,
|
|
|
|
std::vector<TraceDumper::FunctionCallUP> &roots) {
|
|
|
|
|
|
|
|
// We omit the current node because we can't return to itself.
|
|
|
|
TraceDumper::FunctionCall *ancestor = last_function_call.GetParentCall();
|
|
|
|
|
|
|
|
for (; ancestor; ancestor = ancestor->GetParentCall()) {
|
|
|
|
// This loop traverses the tree until it finds a call that we can return to.
|
|
|
|
if (IsSameInstructionSymbolContext(ancestor->GetSymbolInfo(), symbol_info,
|
|
|
|
/*check_source_line_info=*/false)) {
|
|
|
|
// We returned to this symbol, so we are assuming we are returning there
|
|
|
|
// Note: If this is not robust enough, we should actually check if we
|
|
|
|
// returning to the instruction that follows the last instruction from
|
|
|
|
// that call, as that's the behavior of CALL instructions.
|
|
|
|
ancestor->AppendSegment(cursor_sp, symbol_info);
|
|
|
|
return *ancestor;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// We didn't find the call we were looking for, so we now create a synthetic
|
|
|
|
// one that will contain the new instruction in its first traced segment.
|
|
|
|
TraceDumper::FunctionCallUP new_root =
|
|
|
|
std::make_unique<TraceDumper::FunctionCall>(cursor_sp, symbol_info);
|
2022-10-15 18:52:22 -07:00
|
|
|
// This new root will own the previous root through an untraced prefix segment.
|
|
|
|
new_root->SetUntracedPrefixSegment(std::move(roots.back()));
|
2022-10-10 12:57:13 -07:00
|
|
|
roots.pop_back();
|
|
|
|
// We update the roots container to point to the new root
|
|
|
|
roots.emplace_back(std::move(new_root));
|
|
|
|
return *roots.back();
|
|
|
|
}
|
|
|
|
|
|
|
|
/// Append an instruction to a function call forest. The new instruction might
|
|
|
|
/// be appended to the current segment, to a new nest call, or return to an
|
|
|
|
/// ancestor call.
|
|
|
|
///
|
|
|
|
/// \param[in] exe_ctx
|
|
|
|
/// The exeuction context of the traced thread.
|
|
|
|
///
|
|
|
|
/// \param[in] last_function_call
|
|
|
|
/// The chronologically most recent function call before the new instruction.
|
|
|
|
///
|
|
|
|
/// \param[in] prev_symbol_info
|
|
|
|
/// The symbol information of the previous instruction in the trace.
|
|
|
|
///
|
|
|
|
/// \param[in] symbol_info
|
|
|
|
/// The symbol information of the new instruction.
|
|
|
|
///
|
|
|
|
/// \param[in] cursor_sp
|
|
|
|
/// The cursor pointing to the new instruction.
|
|
|
|
///
|
|
|
|
/// \param[in,out] roots
|
|
|
|
/// The object owning the roots. It might be modified if a new root needs to
|
|
|
|
/// be created.
|
|
|
|
///
|
|
|
|
/// \return
|
|
|
|
/// A reference to the function call that owns the new instruction.
|
|
|
|
static TraceDumper::FunctionCall &AppendInstructionToFunctionCallForest(
|
|
|
|
const ExecutionContext &exe_ctx,
|
|
|
|
TraceDumper::FunctionCall *last_function_call,
|
|
|
|
const TraceDumper::SymbolInfo &prev_symbol_info,
|
|
|
|
const TraceDumper::SymbolInfo &symbol_info, const TraceCursorSP &cursor_sp,
|
|
|
|
std::vector<TraceDumper::FunctionCallUP> &roots) {
|
|
|
|
if (!last_function_call || last_function_call->IsError()) {
|
|
|
|
// We create a brand new root
|
|
|
|
roots.emplace_back(
|
|
|
|
std::make_unique<TraceDumper::FunctionCall>(cursor_sp, symbol_info));
|
|
|
|
return *roots.back();
|
|
|
|
}
|
|
|
|
|
2023-11-11 11:21:32 -08:00
|
|
|
lldb_private::AddressRange range;
|
2022-10-10 12:57:13 -07:00
|
|
|
if (symbol_info.sc.GetAddressRange(
|
|
|
|
eSymbolContextBlock | eSymbolContextFunction | eSymbolContextSymbol,
|
|
|
|
0, /*inline_block_range*/ true, range)) {
|
|
|
|
if (range.GetBaseAddress() == symbol_info.address) {
|
|
|
|
// Our instruction is the first instruction of a function. This has
|
|
|
|
// to be a call. This should also identify if a trampoline or the linker
|
|
|
|
// is making a call using a non-CALL instruction.
|
|
|
|
return last_function_call->GetLastTracedSegment().CreateNestedCall(
|
|
|
|
cursor_sp, symbol_info);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (IsSameInstructionSymbolContext(prev_symbol_info, symbol_info,
|
|
|
|
/*check_source_line_info=*/false)) {
|
|
|
|
// We are still in the same function. This can't be a call because otherwise
|
|
|
|
// we would be in the first instruction of the symbol.
|
|
|
|
last_function_call->GetLastTracedSegment().AppendInsn(cursor_sp,
|
|
|
|
symbol_info);
|
|
|
|
return *last_function_call;
|
|
|
|
}
|
|
|
|
// Now we are in a different symbol. Let's see if this is a return or a
|
|
|
|
// call
|
2022-10-19 00:18:01 -07:00
|
|
|
const InstructionSP &insn = last_function_call->GetLastTracedSegment()
|
|
|
|
.GetLastInstructionSymbolInfo()
|
|
|
|
.instruction;
|
|
|
|
InstructionControlFlowKind insn_kind =
|
|
|
|
insn ? insn->GetControlFlowKind(&exe_ctx)
|
|
|
|
: eInstructionControlFlowKindOther;
|
|
|
|
|
|
|
|
switch (insn_kind) {
|
2022-10-10 12:57:13 -07:00
|
|
|
case lldb::eInstructionControlFlowKindCall:
|
|
|
|
case lldb::eInstructionControlFlowKindFarCall: {
|
|
|
|
// This is a regular call
|
|
|
|
return last_function_call->GetLastTracedSegment().CreateNestedCall(
|
|
|
|
cursor_sp, symbol_info);
|
|
|
|
}
|
|
|
|
case lldb::eInstructionControlFlowKindFarReturn:
|
|
|
|
case lldb::eInstructionControlFlowKindReturn: {
|
|
|
|
// We should have caught most trampolines and linker functions earlier, so
|
|
|
|
// let's assume this is a regular return.
|
|
|
|
return AppendReturnedInstructionToFunctionCallForest(
|
|
|
|
*last_function_call, symbol_info, cursor_sp, roots);
|
|
|
|
}
|
|
|
|
default:
|
|
|
|
// we changed symbols not using a call or return and we are not in the
|
|
|
|
// beginning of a symbol, so this should be something very artificial
|
|
|
|
// or maybe a jump to some label in the middle of it section.
|
|
|
|
|
|
|
|
// We first check if it's a return from an inline method
|
|
|
|
if (prev_symbol_info.sc.block &&
|
|
|
|
prev_symbol_info.sc.block->GetContainingInlinedBlock()) {
|
|
|
|
return AppendReturnedInstructionToFunctionCallForest(
|
|
|
|
*last_function_call, symbol_info, cursor_sp, roots);
|
|
|
|
}
|
|
|
|
// Now We assume it's a call. We should revisit this in the future.
|
|
|
|
// Ideally we should be able to decide whether to create a new tree,
|
|
|
|
// or go deeper or higher in the stack.
|
|
|
|
return last_function_call->GetLastTracedSegment().CreateNestedCall(
|
|
|
|
cursor_sp, symbol_info);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/// Append an error to a function call forest. The new error might be appended
|
|
|
|
/// to the current segment if it contains errors or will create a new root.
|
|
|
|
///
|
|
|
|
/// \param[in] last_function_call
|
|
|
|
/// The chronologically most recent function call before the new error.
|
|
|
|
///
|
|
|
|
/// \param[in] cursor_sp
|
|
|
|
/// The cursor pointing to the new error.
|
|
|
|
///
|
|
|
|
/// \param[in,out] roots
|
|
|
|
/// The object owning the roots. It might be modified if a new root needs to
|
|
|
|
/// be created.
|
|
|
|
///
|
|
|
|
/// \return
|
|
|
|
/// A reference to the function call that owns the new error.
|
|
|
|
TraceDumper::FunctionCall &AppendErrorToFunctionCallForest(
|
|
|
|
TraceDumper::FunctionCall *last_function_call, TraceCursorSP &cursor_sp,
|
|
|
|
std::vector<TraceDumper::FunctionCallUP> &roots) {
|
|
|
|
if (last_function_call && last_function_call->IsError()) {
|
|
|
|
last_function_call->GetLastTracedSegment().AppendInsn(
|
|
|
|
cursor_sp, TraceDumper::SymbolInfo{});
|
|
|
|
return *last_function_call;
|
|
|
|
} else {
|
|
|
|
roots.emplace_back(std::make_unique<TraceDumper::FunctionCall>(
|
|
|
|
cursor_sp, TraceDumper::SymbolInfo{}));
|
|
|
|
return *roots.back();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static std::vector<TraceDumper::FunctionCallUP>
|
|
|
|
CreateFunctionCallForest(TraceCursorSP &cursor_sp,
|
|
|
|
const ExecutionContext &exe_ctx) {
|
|
|
|
|
|
|
|
std::vector<TraceDumper::FunctionCallUP> roots;
|
|
|
|
TraceDumper::SymbolInfo prev_symbol_info;
|
|
|
|
|
|
|
|
TraceDumper::FunctionCall *last_function_call = nullptr;
|
|
|
|
|
|
|
|
for (; cursor_sp->HasValue(); cursor_sp->Next()) {
|
|
|
|
if (cursor_sp->IsError()) {
|
|
|
|
last_function_call = &AppendErrorToFunctionCallForest(last_function_call,
|
|
|
|
cursor_sp, roots);
|
|
|
|
prev_symbol_info = {};
|
|
|
|
} else if (cursor_sp->IsInstruction()) {
|
|
|
|
TraceDumper::SymbolInfo symbol_info = CalculateSymbolInfo(
|
|
|
|
exe_ctx, cursor_sp->GetLoadAddress(), prev_symbol_info);
|
|
|
|
|
|
|
|
last_function_call = &AppendInstructionToFunctionCallForest(
|
|
|
|
exe_ctx, last_function_call, prev_symbol_info, symbol_info, cursor_sp,
|
|
|
|
roots);
|
|
|
|
prev_symbol_info = symbol_info;
|
|
|
|
} else if (cursor_sp->GetEventType() == eTraceEventCPUChanged) {
|
|
|
|
// TODO: In case of a CPU change, we create a new root because we haven't
|
|
|
|
// investigated yet if a call tree can safely continue or if interrupts
|
|
|
|
// could have polluted the original call tree.
|
|
|
|
last_function_call = nullptr;
|
|
|
|
prev_symbol_info = {};
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return roots;
|
|
|
|
}
|
|
|
|
|
|
|
|
void TraceDumper::DumpFunctionCalls() {
|
|
|
|
ThreadSP thread_sp = m_cursor_sp->GetExecutionContextRef().GetThreadSP();
|
|
|
|
ExecutionContext exe_ctx;
|
|
|
|
thread_sp->GetProcess()->GetTarget().CalculateExecutionContext(exe_ctx);
|
|
|
|
|
|
|
|
m_writer_up->FunctionCallForest(
|
|
|
|
CreateFunctionCallForest(m_cursor_sp, exe_ctx));
|
|
|
|
}
|