DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
Summary:
These tools `amdhsa-loader` and `nvptx-loader` are used to execute unit
tests directly on the GPU. We use this for `libc` and `libcxx` unit
tests as well as general GPU experimentation. It looks like this.
```console
> clang++ main.cpp --target=amdgcn-amd-amdhsa -mcpu=native -flto -lc ./lib/amdgcn-amd-amdhsa/crt1.o
> llvm-gpu-loader a.out
Hello World!
```
Currently these are a part of the `libc` project, but this creates
issues as `libc` itself depends on them to run tests. Right now we get
around this by force-including the `libc` project prior to running the
runtimes build so that this dependency can be built first. We should
instead just make this a simple LLVM tool so it's always available.
This has the effect of installing these by default now instead of just
when `libc` was enabled, but they should be relatively small. Right now
this only supports a 'static' configuration. That is, we locate the CUDA
and HSA dependencies at LLVM compile time. In the future we should be
able to provide this by default using `dlopen` and some API.
I don't know if it's required to reformat all of these names since they
used the `libc` naming convention so I just left it for now.
Before this patch, whole program devirtualization is suppressed on a
class if any superclass is visible to regular object files, by recording
the class GUID in `VisibleToRegularObjSymbols`.
This patch suppresses whole program devirtualization on a class if the
LTO unit doesn't have the prevailing definition of this class (e.g., the
prevailing definition is in a shared library)
Implementation summaries:
1. In llvm/lib/LTO/LTO.cpp, `IsVisibleToRegularObj` is updated to look
at the global resolution's `IsPrevailing` bit for ThinLTO and
regularLTO.
2. In llvm/tools/llvm-lto2/llvm-lto2.cpp,
- three command line options are added so `llvm-lto2` can override
`Conf.HasWholeProgramVisibility`, `Conf.ValidateAllVtablesHaveTypeInfos`
and `Conf.AllVtablesHaveTypeInfos`.
The test case is reduced from a small C++ program (main.cc, lib.cc/h
pasted below in [1]). To reproduce the program failure without this
patch, compile lib.cc into a shared library, and provide it to a ThinLTO
build of main.cc (commands are pasted in [2]).
[1]
* lib.h
```
#include <cstdio>
class Derived {
public:
void dispatch();
virtual void print();
virtual void sum();
};
void Derived::dispatch() {
static_cast<Derived*>(this)->print();
static_cast<Derived*>(this)->sum();
}
void Derived::sum() {
printf("Derived::sum\n");
}
__attribute__((noinline)) void* create(int i);
__attribute__((noinline)) void* getPtr(int i);
```
* lib.cc
```
#include "lib.h"
#include <cstdio>
#include <iostream>
class Derived2 : public Derived {
public:
void print() override {
printf("DerivedSharedLib\n");
}
void sum() override {
printf("DerivedSharedLib::sum\n");
}
};
void Derived::print() {
printf("Derived\n");
}
__attribute__((noinline)) void* create(int i) {
if (i & 1)
return new Derived2();
return new Derived();
}
```
* main.cc
```
cat main.cc
#include "lib.h"
class DerivedN : public Derived {
public:
};
__attribute__((noinline)) void* getPtr(int x) {
return new DerivedN();
}
int main() {
Derived*b = static_cast<Derived*>(create(201));
b->dispatch();
delete b;
Derived* a = static_cast<Derived*>(getPtr(202));
a->dispatch();
delete a;
return 0;
}
```
[2]
```
# compile lib.o in a shared library.
$ ./bin/clang++ -O2 -fPIC -c lib.cc -o lib.o
$ ./bin/clang++ -shared -o libdata.so lib.o
# Provide the shared library in `-ldata`
$ ./bin/clang++ -v -g -ldata --save-temps -fno-discard-value-names -Wl,-mllvm,-print-before=wholeprogramdevirt -Wl,-mllvm,-wholeprogramdevirt-check=trap -Rpass=wholeprogramdevirt -Wl,--lto-whole-program-visibility -Wl,--lto-validate-all-vtables-have-type-infos -mllvm -disable-icp=true -Wl,-mllvm,-disable-icp=false -flto=thin -fwhole-program-vtables -fno-split-lto-unit -fuse-ld=lld main.cc -L . -o main >/tmp/wholeprogramdevirt.ir 2>&1
# Run the program hits a segmentation fault with `-Wl,-mllvm,-wholeprogramdevirt-check=trap`
$ LD_LIBRARY_PATH=. ./main
DerivedSharedLib
Trace/breakpoint trap (core dumped)
```
Rather than requiring users to pass `-m64` to the `llvm-ml` driver to
get 64-bit behavior, we add the `llvm-ml64` alias, matching the behavior
of `ML.EXE` and `ML64.EXE`.
The original flavor/bitness flags still work, but the alias should make
some workflows easier.
NOTE: The logic for this already existed in the code; we're just finally
adding the build/install instructions to match.
RPaths are basically search paths for how to load dependent libraries.
The order they appear is the order the linker will search, we should
preserve that order in tbd files.
* Additionally add this level of detection to llvm-readtapi.
resolves: rdar://145603347
Currently, `DIContext::getLineInfoForAddress` and
`DIContext::getLineInfoForDataAddress` returns empty DILineInfo when the
debug info is missing for the given address. This is not differentiable
with the case when debug info is found for the given address but the
debug info is default value (filename:linenum is <invalid>:0).
This change wraps the return types of `DIContext::getLineInfoForAddress`
and `DIContext::getLineInfoForDataAddress` with `std::optional`.
Until recently checkFeature was quite slow. #130936
I was curious where we use checkFeature and noticed these. I thought we
could use hasFeature instead of going through strings.
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.
Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.
(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
This is meant as a preparation for PR #130988 "[AMDGPU] Implement IR
expansion for frem instruction" which implements the expansion of
another instruction in this pass. The more general name seems more
appropriate given this change and quite reasonable even without it.
Current implementation (for AArch64) only supports the GRP32, GPR64,
FPR64/128, PPR16 and ZPR128 register classes. This adds support for
the other floating point register classes to initialize registers and avoid
the "setReg is not implemented" warning for these cases.
This patch adds SYCL Module splitting - the necessary step in the SYCL
compilation pipeline. Only 2 splitting modes are being added in this
patch: by kernel and by source.
Refactor readobj to integrate AArch64 Build Attributes under
ELFAttributeParser. ELFAttributeParser now serves as a base class for:
- ELFCompactAttrParser, handling Arm-style attributes with a single
build attribute subsection.
- ELFExtendedAttrParser, handling AArch64-style attributes with multiple
build attribute subsections. This improves code organization and better
aligns with the attribute parsing model.
Add support for parsing AArch64 Build Attributes.
This reverts commit b0baa1d8bd68a2ce2f7c5f2b62333e410e9122a1.
Reverting follow-up commit to ce9e1d3c15ed6290f1cb07b482939976fa8115cd since the original commit test is flaky.
…Stream.
CachedFileStream has previously performed the commit step in its
destructor, but this means its only recourse for error handling is
report_fatal_error. Modify this to add an explicit commit() method, and
call this in the appropriate places with appropriate error handling for
the location.
Currently the destructor of CacheStream gives an assert failure in Debug
builds if commit() was not called. This will help track down any
remaining uses of the API that assume the old destructior behaviour. In
Release builds we fall back to the previous behaviour and call
report_fatal_error if the commit fails.
This change means that llvm-strip no longer exits immediately upon
encountering an error when modifying a file and will instead continue
modifying the other inputs. Fixes#129412
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.
For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.
The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
Current implementation (for Aarch64) in llvm-exegesis only supports
GRP32 and GPR64 bit register class, thus for opcodes variants which used
FPR64/128, PPR16 and ZPR128, llvm-exegesis throws warning "setReg is not
implemented". This code will handle the above register class and
initialize the registers using appropriate base instruction class.
This re-applies f905bf3e1ef860c4d6fe67fb64901b6bbe698a91, which was reverted in
c861c1a046eb8c1e546a8767e0010904a3c8c385 due to compiler errors, with a fix for
MLIR.
The profile format has now a separate section called "Contexts" - there will be a corresponding one for flat profiles. The root has a separate tag because, in addition to not having a callsite ID as all the other context nodes have under it, it will have additional fields in subsequent patches.
The rest of this patch amounts to a bit of refactorings in the reader/writer (for better reuse later) and tests fixups.
Previously, when an inlined library existed in TBD file A but not in file B, all of the inlined library's attributes were printed. This is noisy since the important detail is the complete contents are missing. Instead, only print the install name of the inlined library and the marker for which the input file exists in.
This commit adds support for WebAssembly's custom-page-sizes proposal to
`wasm-ld`. An overview of the proposal can be found
[here](https://github.com/WebAssembly/custom-page-sizes/blob/main/proposals/custom-page-sizes/Overview.md).
In a sentence, it allows customizing a Wasm memory's page size, enabling
Wasm to target environments with less than 64KiB of memory (the default
Wasm page size) available for Wasm memories.
This commit contains the following:
* Adds a `--page-size=N` CLI flag to `wasm-ld` for configuring the
linked Wasm binary's linear memory's page size.
* When the page size is configured to a non-default value, then the
final Wasm binary will use the encodings defined in the
custom-page-sizes proposal to declare the linear memory's page size.
* Defines a `__wasm_first_page_end` symbol, whose address points to the
first page in the Wasm linear memory, a.k.a. is the Wasm memory's page
size. This allows writing code that is compatible with any page size,
and doesn't require re-compiling its object code. At the same time,
because it just lowers to a constant rather than a memory access or
something, it enables link-time optimization.
* Adds tests for these new features.
r? @sbc100
cc @sunfishcode
This fix helps to map operand memory to destination registers. If
instruction is load, we can self-alias it in case when instruction
overrides whole address register. For that we use provided scratch
memory.