This patch adds MemProfReader::takeMemProfData, a function to return
the complete MemProf profile from the reader. We can directly pass
its return value to InstrProfWriter::addMemProfData without having to
deal with the indivual components of the MemProf profile. The new
function is named "take", but it doesn't do std::move yet because of
type differences (DenseMap v.s. MapVector).
The end state I'm trying to get to is roughly as follows:
- MemProfReader accepts IndexedMemProfData as a parameter as opposed
to the three individual components (frames, call stacks, and
records).
- MemProfReader keeps IndexedMemProfData as a class member without
decomposing it into its individual components.
- MemProfReader returns IndexedMemProfData like:
IndexedMemProfData takeMemProfData() {
return std::move(MemProfData);
}
While llvm-exegesis has explicit support for setting EFLAGS which
contains DF, it can be nice sometimes to explicitly set DF, especially
given that it is modeled as a separate register within LLVM. This patch
adds the ability to do that by lowering setting the value to 0 or 1 to
cld and std respectively.
LazyObjectLinkingLayer can be used to add object files that will not be linked
into the executor unless some function that they define is called at runtime.
(References to data members defined by these objects will still trigger
immediate linking)
To implement lazy linking, LazyObjectLinkingLayer uses the lazyReexports
utility to construct stubs for each function in a given object file, and an
ObjectLinkingLayer::Plugin to rename the function bodies at link-time. (Data
symbols are not renamed)
The llvm-jitlink utility is extended with a -lazy option that can be
passed before input files or archives to add them using the lazy linking
layer rather than the base ObjectLinkingLayer.
This patch removes MemProf format Version 0 now that version 2 and 3
seem to be working well.
I'm not touching version 1 for now because some tests still rely on
version 1.
Note that Version 0 is identical to Version 1 except that the MemProf
section of the indexed format has a MemProf version field.
* Have clang always append & pass System/Library/SubFrameworks when determining default sdk search paths.
* Teach clang-installapi to traverse there for framework input.
* Teach llvm-readtapi that the library files (TBD or binary) in there should be considered private.
resolves: rdar://137457006
Following discussions in #110443, and the following earlier discussions
in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html,
https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this
PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine`
interface classes. More specifically:
1. Makes `TargetMachine` the only class implemented under
`TargetMachine.h` in the `Target` library.
2. `TargetMachine` contains target-specific interface functions that
relate to IR/CodeGen/MC constructs, whereas before (at least on paper)
it was supposed to have only IR/MC constructs. Any Target that doesn't
want to use the independent code generator simply does not implement
them, and returns either `false` or `nullptr`.
3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming
aims to make the purpose of `LLVMTargetMachine` clearer. Its interface
was moved under the CodeGen library, to further emphasis its usage in
Targets that use CodeGen directly.
4. Makes `TargetMachine` the only interface used across LLVM and its
projects. With these changes, `CodeGenCommonTMImpl` is simply a set of
shared function implementations of `TargetMachine`, and CodeGen users
don't need to static cast to `LLVMTargetMachine` every time they need a
CodeGen-specific feature of the `TargetMachine`.
5. More importantly, does not change any requirements regarding library
linking.
cc @arsenm @aeubanks
This implements a global function merging pass. Unlike traditional
function merging passes that use IR comparators, this pass employs a
structurally stable hash to identify similar functions while ignoring
certain constant operands. These ignored constants are tracked and
encoded into a stable function summary. When merging, instead of
explicitly folding similar functions and their call sites, we form a
merging instance by supplying different parameters via thunks. The
actual size reduction occurs when identically created merging instances
are folded by the linker.
Currently, this pass is wired to a pre-codegen pass, enabled by the
`-enable-global-merge-func` flag.
In a local merging mode, the analysis and merging steps occur
sequentially within a module:
- `analyze`: Collects stable function hashes and tracks locations of
ignored constant operands.
- `finalize`: Identifies merge candidates with matching hashes and
computes the set of parameters that point to different constants.
- `merge`: Uses the stable function map to optimistically create a
merged function.
We can enable a global merging mode similar to the global function
outliner
(https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/),
which will perform the above steps separately.
- `-codegen-data-generate`: During the first round of code generation,
we analyze local merging instances and publish their summaries.
- Offline using `llvm-cgdata` or at link-time, we can finalize all these
merging summaries that are combined to determine parameters.
- `-codegen-data-use`: During the second round of code generation, we
optimistically create merging instances within each module, and finally,
the linker folds identically created merging instances.
Depends on #112664
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
These symbols need to be exported for llvm-pdbutil when using windows
shared library builds.
Exclude the YAML traits declared in llvm-pdbutil so there not declared
as dllimported which will causing missing symbol errors for windows
shared library builds.
This is part of the work to enable LLVM_BUILD_LLVM_DYLIB and plugins on
window.
Use LLVM_ALWAYS_EXPORT for __jit_debug_descriptor and
__jit_debug_register_code so there exported even if LLVM is not built as
a shared library.
This is part of the work to enable LLVM_BUILD_LLVM_DYLIB and plugins on
windows #109483.
Claiming AArch64 support for llvm-exegesis is a bit of a stretch in my
opinion as only a couple of opcodes with GPR64 operands will work for
snippet benchmarking, so I propose to clarify that AArch64 support is
very experimental. Also added some clarifications about its libpfm4
dependency.
This reverts commit c31014322c0b5ae596da129cbb844fb2198b4ef4.
Based on the discussions in #112772, this pass is not needed after the
introduction of `llvm.threadlocal.address` intrinsic.
Fixes https://github.com/llvm/llvm-project/issues/112771.
This introduces a new cgdata format for stable function maps. The raw
data is embedded in the __llvm_merge section during compile time. This
data can be read and merged using the llvm-cgdata tool, into an indexed
cgdata file. Consequently, the tool is now capable of handling either
outlined hash trees, stable function maps, or both, as they are
orthogonal.
Depends on #112662.
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This patch makes X86 llvm-exegesis unconditionally use older
instructions to load the lower vector registers, rather than trying to
use AVX512 for everything when available. This fixes a case where we
would try and load AVX512 registers using the older instructions if such
a snippet was constructed while -mcpu was set to something that did not
support AVX512. This would lead to a machine code verification error
rather than resulting in incomplete snippet setup, which seems to be the
intention of how this should work.
Fixes#114691.
The early simplication pipeline is used in non-LTO and (Thin/Full)LTO
pre-link
stage. There are some passes that we want them in non-LTO mode, but not
at LTO
pre-link stage. The control is missing currently. This PR adds the
support. To
demonstrate the use, we only enable the internalization pass in non-LTO
mode for
AMDGPU because having it run in pre-link stage causes some issues.
Add support for generating random hotness in the memprof profile writer,
to be used for testing. The random seed is printed to stderr, and an
additional option enables providing a specific seed in order to
reproduce a particular random profile.
llvm-cxxfilt can demangle names of data symbols, in addition to function
names.
$ llvm-cxxfilt _ZN6garden5gnomeE
garden::gnome
And type names too, on request:
$ llvm-cxxfilt -t i
int
Update some overly specific the wording in the --help and documentation
that suggests otherwise.
The Intel C++ Compiler (ICX) passes linker flags through the driver
unlike MSVC and clang-cl, and therefore needs them to be prefixed with
`/Qoption,link` (the equivalent of `-Wl,` for gcc on *nix).
Use `LINKER:` prefix wherever supported by cmake, when that's not
possible fall-back to `${CMAKE_CXX_LINKER_WRAPPER_FLAG}`. CMake replaces
these with `/Qoption,link` for ICX and with the empty string for MSVC
and clang-cl.
For `target_link_libraries` neither `LINKER:` (not supported prior to
CMake 3.32) nor `${CMAKE_CXX_LINKER_WRAPPER_FLAG}` (does not begin with
`-` would be taken as a library name) works, use `-Qoption,link`
directly within a conditional generator expression that we're linking
with ICX.
For MSVC and clang-cl no functional change is intended.
Tested by compiling with ICX and setting
`CMAKE_(EXE|SHARED|STATIC|MODULE)_LINKER_FLAGS_INIT` to
`-Werror=unknown-argument`.
RFC:
https://discourse.llvm.org/t/rfc-cmake-linker-flags-need-wl-equivalent-for-intel-c-icx-on-windows/82446
This is one of the many PRs to fix errors with LLVM_ENABLE_WERROR=on.
Built by GCC 11.
Refactor the code to avoid the false warning
llvm-project/llvm/tools/llvm-isel-fuzzer/llvm-isel-fuzzer.cpp
llvm-project/llvm/tools/llvm-isel-fuzzer/llvm-isel-fuzzer.cpp: In
function ‘int LLVMFuzzerInitialize(int*, char***)’:
llvm-project/llvm/tools/llvm-isel-fuzzer/llvm-isel-fuzzer.cpp:141:43:
error: ISO C++ forbids zero-size array ‘argv’ [-Werror=pedantic]
141 | ExitOnError ExitOnErr(std::string(*argv[0]) + ": error:");
|
Provide a option (--no-object-timestamp) to ignore object file timestamp
mismatches. We already have a similar option for Swift modules
(--no-swiftmodule-timestamp).
rdar://123975869
I (re)discovered that dsymutil was instantiating two BinaryHolders: one
for parsing the debug map and one for linking. That really defeats the
purpose of the BinaryHolder as it serves as a cache. Fix the issue and
remove an old FIXME.
This is useful when looking at LLVM/MLIR assembly produced from C++
sources. For example
cir.call @_ZN3aie4tileILi1ELi4EE7programIZ4mainE3$_0EEvOT_(%2, %7) :
will be translated to
cir.call @"void aie::tile<1, 4>::program<main::$_0>(main::$_0&&)"(%2,
%7) : which can be parsed as valid MLIR by the right mlir-lsp-server.
If a symbol is already quoted, do not quote it more.
---------
Co-authored-by: James Henderson <jh7370@my.bristol.ac.uk>
Currently both True/False counts were folded. It lost the information,
"It is True or False before folding." It prevented recalling branch
counts in merging template instantiations.
In `llvm-cov`, a folded branch is shown as:
- `[True: n, Folded]`
- `[Folded, False n]`
In the case If `n` is zero, a branch is reported as "uncovered". This is
distinguished from "folded" branch. When folded branches are merged,
`Folded` may be dissolved.
In the coverage map, either `Counter` is `Zero`. Currently both were
`Zero`.
Since "partial fold" has been introduced, either case in `switch` is
omitted as `Folded`.
Each `case:` in `switch` is reported as `[True: n, Folded]`, since
`False` count doesn't show meaningful value.
When `switch` doesn't have `default:`, `switch (Cond)` is reported as
`[Folded, False: n]`, since `True` count was just the sum of `case`(s).
`switch` with `default` can be considered as "the statement that doesn't
have any `False`(s)".
On AIX, for undefined functions, only the dotnamed symbols (the address
of the function) are generated after linking (i.e., no named function
symbol is generated).
Currently, all alias symbols are added as defined data symbols when
parsing symbols in LTOModule (the Link Time Optimization library used by
linker to optimization code at link time). On AIX, if the function alias
is used in the native object, and only its dotnamed symbol is generated,
the linker will have problem to match the dotnamed symbol from the
native object and the defined symbol marked as data from the bitcode at
LTO linktime.
This patch is to add function alias as function instead of data.
This patch adds support for forced loading of archive members, similar to the
behavior of the -all_load and -ObjC options in ld64. To enable this, the
StaticLibraryDefinitionGenerator class constructors are extended with a
VisitMember callback that is called on each member file in the archive at
generator construction time. This callback can be used to unconditionally add
the member file to a JITDylib at that point.
To test this the llvm-jitlink utility is extended with -all_load (all platforms)
and -ObjC (darwin only) options. Since we can't refer to symbols in the test
objects directly (these would always cause the member to be linked in, even
without the new flags) we instead test side-effects of force loading: execution
of constructors and registration of Objective-C metadata.
rdar://134446111
We can get a reference to the `ExecutionSession` from the
`ObjectLinkingLayer` argument, so there's no need to pass it in
separately.
This mirrors recent changes to `ElfNixPlatform` and `MachOPlatform` by
@lhames in
3dba4ca155
and
cc20dd285a.