This adds support for the AArch64 soft-float ABI. The specification for
this ABI was added by https://github.com/ARM-software/abi-aa/pull/232.
Because all existing AArch64 hardware has floating-point hardware, we
expect this to be a niche option, only used for embedded systems on
R-profile systems. We are going to document that SysV-like systems
should only ever use the base (hard-float) PCS variant:
https://github.com/ARM-software/abi-aa/pull/233. For that reason, I've
not added an option to select the ABI independently of the FPU hardware,
instead the new ABI is enabled iff the target architecture does not have
an FPU.
For testing, I have run this through an ABI fuzzer, but since this is
the first implementation it can only test for internal consistency
(callers and callees agree on the PCS), not for conformance to the ABI
spec.
Also, Let `NumConditions` `uint16_t`.
It is smarter to handle the ID as signed.
Narrowing to `int16_t` will reduce costs of handling byvalue. (See also
#81221 and #81227)
External behavior doesn't change. They below handle values as internal
values plus 1.
* `-dump-coverage-mapping`
* `CoverageMappingReader.cpp`
* `CoverageMappingWriter.cpp`
We recently implemented a new option allowing relinking of bitcode
modules via the "-mllvm -relink-builtin-bitcode-postop"
option.
This implementation relied on llvm::CloneModule() in order to pass
copies to modules and preserve the original modules for later relinking.
However, cloning modules has been found to be prohibitively expensive,
significantly increasing compilation time for large bitcode libraries.
In this patch, we shift the relink option implementation to instead link
the original modules initially, and reload modules from the file system
if relinking is requested. This approach results in significantly
reduced overhead.
We accomplish this by creating a new ReloadModules() routine that can be
called from a BackendConsumer class, to mimic the behavior of
ASTConsumer's loadLinkModules(), but without access to the
CompilerInstance.
Because loading the bitcodes from the filesystem requires access to the
FileManager class, we also forward a reference to the CompilerInstance
class to the BackendConsumer. This mirrors what is already done for
several CompilerInstance members, such as TargetOptions and
CodeGenOptions.
Finally, we needed to add a const specifier to the
FileManager::getBufferForFile() routine to allow it to be called using
the const reference returned from CompilerInstance::getFileManager()
This patch looses the cast check (`canLosslesslyBitCastTo`) and leaves
it to the
one inside `CreateBitCast`. It seems too conservative for the use case
here.
In an LTO build, we don't set the ELF attributes to indicate what
extensions were compiled with. The target CPU/Attrs in
RISCVTargetMachine do not get set for an LTO build. Each function gets a
target-cpu/feature attribute, but this isn't usable to set ELF attributs
since we wouldn't know what function to use. We can't just once since it
might have been compiler with an attribute likes target_verson.
This patch adds the ISA as Module metadata so we can retrieve it in the
backend. Individual translation units can still be compiled with
different strings so we need to collect the unique set when Modules are
merged.
The backend will need to combine the unique ISA strings to produce a
single value for the ELF attributes. This will be done in a separate
patch.
Instead of only handling vscale x 16 x i1 predicate vectors, handle any
scalable i1 vector where the known minimum is divisible by 8.
This is used on RISC-V where we have multiple sizes of predicate
types.
Summary:
This patch adds a new intrinsic and builtin function mirroring the
existing `__builtin_readcyclecounter`. The difference is that this
implementation targets a separate counter that some targets have which
returns a fixed frequency clock that can be used to determine elapsed
time, this is different compared to the cycle counter which often has
variable frequency.
This patch only adds support for the NVPTX and AMDGPU targets.
This is done as a new and separate builtin rather than an argument to
`readcyclecounter` to avoid needing to change existing code and to make
the separation more explicit.
'serial', 'parallel', and 'kernel' constructs are all considered
'Compute' constructs. This patch creates the AST type, plus the required
infrastructure for such a type, plus some base types that will be useful
in the future for breaking this up.
The only difference between the three is the 'kind'( plus some minor
clause legalization rules, but those can be differentiated easily
enough), so rather than representing them as separate AST nodes, it
seems
to make sense to make them the same.
Additionally, no clause AST functionality is being implemented yet, as
that fits better in a separate patch, and this is enough to get the
'naked' constructs implemented.
This is otherwise an 'NFC' patch, as it doesn't alter execution at all,
so there aren't any tests. I did this to break up the review workload
and to get feedback on the layout.
Introduce `mcdc::DecisionParameters` and `mcdc::BranchParameters` and make
sure them not initialized as zero.
FIXME: Could we make `CoverageMappingRegion` as a smart tagged union?
They can be also used in `clang`.
Introduce the lightweight header instead of `CoverageMapping.h`.
This includes for now:
* `mcdc::ConditionID`
* `mcdc::Parameters`
The performance of cold functions shouldn't matter too much, so if we
care about binary sizes, add an option to mark cold functions as
optsize/minsize for binary size, or optnone for compile times [1]. Clang
patch will be in a future patch.
This is intended to replace `shouldOptimizeForSize(Function&, ...)`.
We've seen multiple cases where calls to this expensive function, if not
careful, can blow up compile times. I will clean up users of that
function in a followup patch.
Initial version: https://reviews.llvm.org/D149800
[1]
https://discourse.llvm.org/t/rfc-new-feature-proposal-de-optimizing-cold-functions-using-pgo-info/56388
In the beginning, Clang only emitted atomic IR for operations it knew
the
underlying microarch had instructions for, meaning it required
significant
knowledge of the target. Later, the backend acquired the ability to
lower
IR to libcalls. To avoid duplicating logic and improve logic locality,
we'd like to move as much as possible to the backend.
There are many ways to describe this change. For example, this change
reduces the variables Clang uses to decide whether to emit libcalls or
IR, down to only the atomic's size.
This patch provides more information to the
`PPCallbacks::InclusionDirective()` hook. We now always pass the
suggested module, regardless of whether it was actually imported or not.
The extra `bool ModuleImported` parameter then denotes whether the
header `#include` will be automatically translated into import the the
module.
The main change is in `clang/lib/Lex/PPDirectives.cpp`, where we take
care to not modify `SuggestedModule` after it's been populated by
`LookupHeaderIncludeOrImport()`. We now exclusively use the `SM`
(`ModuleToImport`) variable instead, which has been equivalent to
`SuggestedModule` until now. This allows us to use the original
non-modified `SuggestedModule` for the callback itself.
(This patch turns out to be necessary for
https://github.com/apple/llvm-project/pull/8011).
Testing the shift-exponent check with small width _BitInt values exposed
a bug in ScalarExprEmitter::GetWidthMinusOneValue when using the result
to determine valid exponent sizes. False positives were reported for
some left shifts when width(LHS)-1 > range(RHS) and false negatives were
reported for right shifts when value(RHS) > range(LHS). This patch caps
the maximum value of GetWidthMinusOneValue to fit within range(RHS) to
fix the issue with left shifts and fixes a code generation in EmitShr to
fix the issue with right shifts and renames the function to
GetMaximumShiftAmount to better reflect the new behaviour.
Fixes#80135.
Co-authored-by: Adam Magier <adam.magier@ericsson.com>
The new experimental calling convention preserve_none is the opposite
side of existing preserve_all. It tries to preserve as few general
registers as possible. So all general registers are caller saved
registers. It can also uses more general registers to pass arguments.
This attribute doesn't impact floating-point registers. Floating-point
registers still follow the c calling convention.
Currently preserve_none is supported on X86-64 only. It changes the c
calling convention in following fields:
* RSP and RBP are the only preserved general registers, all other
general registers are caller saved registers.
* We can use [RDI, RSI, RDX, RCX, R8, R9, R11, R12, R13, R14, R15, RAX]
to pass arguments.
It can improve the performance of hot tailcall chain, because many
callee saved registers' save/restore instructions can be removed if the
tail functions are using preserve_none. In my experiment in protocol
buffer, the parsing functions are improved by 3% to 10%.
Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same
as V5 except a new "generic version" flag can be present in EFLAGS. This
is related to new generic targets that'll be added in a follow-up patch.
It's also likely V6 will have new changes (possibly new metadata
entries) added later.
Docs change are part of the follow-up patch #76955
According to the C++ standard, `dynamic_cast` of pointers either returns
a pointer (7.6.1.7) or results in undefined behavior (11.9.5). This
patch marks `__dynamic_cast` as `willreturn` to remove unused calls.
Fixes#77606.
In 2155195131a57f2f01e7cfabb85bb027518c2dc6, the
"system-headers-coverage" option has been added but not used in all
necessary places.
This is the recommit since it has been reverted in
faef68bca852d08511ea0311d8a0d221cb202e73
Potential reviewers: @gulfemsavrun @petrhosek
Co-authored-by: Manuel Kalettka <manuel.kalettka@kernkonzept.com>
Today `-split-machine-functions` and `-fbasic-block-sections={all,list}`
cannot be combined with `-basic-block-sections=labels` (the labels
option will be ignored).
The inconsistency comes from the way basic block address map -- the
underlying mechanism for basic block labels -- encodes basic block
addresses
(https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html).
Specifically, basic block offsets are computed relative to the function
begin symbol. This relies on functions being contiguous which is not the
case for MFS and basic block section binaries. This means Propeller
cannot use binary profiles collected from these binaries, which limits
the applicability of Propeller for iterative optimization.
To make the `SHT_LLVM_BB_ADDR_MAP` feature work with basic block section
binaries, we propose modifying the encoding of this section as follows.
First let us review the current encoding which emits the address of each
function and its number of basic blocks, followed by basic block entries
for each basic block.
| | |
|--|--|
| Address of the function | Function Address |
| Number of basic blocks in this function | NumBlocks |
| BB entry 1
| BB entry 2
| ...
| BB entry #NumBlocks
To make this work for basic block sections, we treat each basic block
section similar to a function, except that basic block sections of the
same function must be encapsulated in the same structure so we can map
all of them to their single function.
We modify the encoding to first emit the number of basic block sections
(BB ranges) in the function. Then we emit the address map of each basic
block section section as before: the base address of the section, its
number of blocks, and BB entries for its basic block. The first section
in the BB address map is always the function entry section.
| | |
|--|--|
| Number of sections for this function | NumBBRanges |
| Section 1 begin address | BaseAddress[1] |
| Number of basic blocks in section 1 | NumBlocks[1] |
| BB entries for Section 1
|..................|
| Section #NumBBRanges begin address | BaseAddress[NumBBRanges] |
| Number of basic blocks in section #NumBBRanges |
NumBlocks[NumBBRanges] |
| BB entries for Section #NumBBRanges
The encoding of basic block entries remains as before with the minor
change that each basic block offset is now computed relative to the
begin symbol of its containing BB section.
This patch adds a new boolean codegen option `-basic-block-address-map`.
Correspondingly, the front-end flag `-fbasic-block-address-map` and LLD
flag `--lto-basic-block-address-map` are introduced.
Analogously, we add a new TargetOption field `BBAddrMap`. This means BB
address maps are either generated for all functions in the compiling
unit, or for none (depending on `TargetOptions::BBAddrMap`).
This patch keeps the functionality of the old
`-fbasic-block-sections=labels` option but does not remove it. A
subsequent patch will remove the obsolete option.
We refactor the `BasicBlockSections` pass by separating the BB address
map and BB sections handing to their own functions (named
`handleBBAddrMap` and `handleBBSections`). `handleBBSections` renumbers
basic blocks and places them in their assigned sections.
`handleBBAddrMap` is invoked after `handleBBSections` (if requested) and
only renumbers the blocks.
- New tests added:
- Two tests basic-block-address-map-with-basic-block-sections.ll and
basic-block-address-map-with-mfs.ll to exercise the combination of
`-basic-block-address-map` with `-basic-block-sections=list` and
'-split-machine-functions`.
- A driver sanity test for the `-fbasic-block-address-map` option
(basic-block-address-map.c).
- An LLD test for testing the `--lto-basic-block-address-map` option.
This reuses the LLVM IR from `lld/test/ELF/lto/basic-block-sections.ll`.
- Renamed and modified the two existing codegen tests for basic block
address map (`basic-block-sections-labels-functions-sections.ll` and
`basic-block-sections-labels.ll`)
- Removed `SHT_LLVM_BB_ADDR_MAP_V0` tests. Full deprecation of
`SHT_LLVM_BB_ADDR_MAP_V0` and `SHT_LLVM_BB_ADDR_MAP` version less than 2
will happen in a separate PR in a few months.
Since https://github.com/ARM-software/acle/pull/276 the ACLE
defines attributes to better describe the use of a given SME state.
Previously the attributes merely described the possibility of it being
'shared' or 'preserved', whereas the new attributes have more semantics
and also describe how the data flows through the program.
For ZT0 we already had to add new LLVM IR attributes:
* aarch64_new_zt0
* aarch64_in_zt0
* aarch64_out_zt0
* aarch64_inout_zt0
* aarch64_preserves_zt0
We have now done the same for ZA, such that we add:
* aarch64_new_za (previously `aarch64_pstate_za_new`)
* aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_inout_za (more specific variation of
`aarch64_pstate_za_shared`)
* aarch64_preserves_za (previously `aarch64_pstate_za_shared,
aarch64_pstate_za_preserved`)
This explicitly removes 'pstate' from the name, because with SME2 and
the new ACLE attributes there is a difference between "sharing ZA"
(sharing
the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing
either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
This is a support for " #pragma omp atomic compare weak". It has Parser
& AST support for now.
---------
Authored-by: Sunil Kuravinakop <kuravina@pe28vega.us.cray.com>
This re-applies 30155fc0 with a fix for clangd.
### Description
clang don't evaluate the object argument of `static operator()` and
`static operator[]` currently, for example:
```cpp
#include <iostream>
struct Foo {
static int operator()(int x, int y) {
std::cout << "Foo::operator()" << std::endl;
return x + y;
}
static int operator[](int x, int y) {
std::cout << "Foo::operator[]" << std::endl;
return x + y;
}
};
Foo getFoo() {
std::cout << "getFoo()" << std::endl;
return {};
}
int main() {
std::cout << getFoo()(1, 2) << std::endl;
std::cout << getFoo()[1, 2] << std::endl;
}
```
`getFoo()` is expected to be called, but clang don't call it currently
(17.0.6). This PR fixes this issue.
Fixes#67976, reland #68485.
### Walkthrough
- **clang/lib/Sema/SemaOverload.cpp**
- **`Sema::CreateOverloadedArraySubscriptExpr` &
`Sema::BuildCallToObjectOfClassType`**
Previously clang generate `CallExpr` for static operators, ignoring the
object argument. In this PR `CXXOperatorCallExpr` is generated for
static operators instead, with the object argument as the first
argument.
- **`TryObjectArgumentInitialization`**
`const` / `volatile` objects are allowed for static methods, so that we
can call static operators on them.
- **clang/lib/CodeGen/CGExpr.cpp**
- **`CodeGenFunction::EmitCall`**
CodeGen changes for `CXXOperatorCallExpr` with static operators: emit
and ignore the object argument first, then emit the operator call.
- **clang/lib/AST/ExprConstant.cpp**
- **`ExprEvaluatorBase::handleCallExpr`**
Evaluation of static operators in constexpr also need some small changes
to work, so that the arguments won't be out of position.
- **clang/lib/Sema/SemaChecking.cpp**
- **`Sema::CheckFunctionCall`**
Code for argument checking also need to be modify, or it will fail the
test `clang/test/SemaCXX/overloaded-operator-decl.cpp`.
- **clang-tools-extra/clangd/InlayHints.cpp**
- **`InlayHintVisitor::VisitCallExpr`**
Now that the `CXXOperatorCallExpr` for static operators also have object
argument, we should also take care of this situation in clangd.
### Tests
- **Added:**
- **clang/test/AST/ast-dump-static-operators.cpp**
Verify the AST generated for static operators.
- **clang/test/SemaCXX/cxx2b-static-operator.cpp**
Static operators should be able to be called on const / volatile
objects.
- **Modified:**
- **clang/test/CodeGenCXX/cxx2b-static-call-operator.cpp**
- **clang/test/CodeGenCXX/cxx2b-static-subscript-operator.cpp**
Matching the new CodeGen.
### Documentation
- **clang/docs/ReleaseNotes.rst**
Update release notes.
---------
Co-authored-by: Shafik Yaghmour <shafik@users.noreply.github.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
### Description
clang don't evaluate the object argument of `static operator()` and
`static operator[]` currently, for example:
```cpp
#include <iostream>
struct Foo {
static int operator()(int x, int y) {
std::cout << "Foo::operator()" << std::endl;
return x + y;
}
static int operator[](int x, int y) {
std::cout << "Foo::operator[]" << std::endl;
return x + y;
}
};
Foo getFoo() {
std::cout << "getFoo()" << std::endl;
return {};
}
int main() {
std::cout << getFoo()(1, 2) << std::endl;
std::cout << getFoo()[1, 2] << std::endl;
}
```
`getFoo()` is expected to be called, but clang don't call it currently
(17.0.2). This PR fixes this issue.
Fixes#67976.
### Walkthrough
- **clang/lib/Sema/SemaOverload.cpp**
- **`Sema::CreateOverloadedArraySubscriptExpr` &
`Sema::BuildCallToObjectOfClassType`**
Previously clang generate `CallExpr` for static operators, ignoring the
object argument. In this PR `CXXOperatorCallExpr` is generated for
static operators instead, with the object argument as the first
argument.
- **`TryObjectArgumentInitialization`**
`const` / `volatile` objects are allowed for static methods, so that we
can call static operators on them.
- **clang/lib/CodeGen/CGExpr.cpp**
- **`CodeGenFunction::EmitCall`**
CodeGen changes for `CXXOperatorCallExpr` with static operators: emit
and ignore the object argument first, then emit the operator call.
- **clang/lib/AST/ExprConstant.cpp**
- **`ExprEvaluatorBase::handleCallExpr`**
Evaluation of static operators in constexpr also need some small changes
to work, so that the arguments won't be out of position.
- **clang/lib/Sema/SemaChecking.cpp**
- **`Sema::CheckFunctionCall`**
Code for argument checking also need to be modify, or it will fail the
test `clang/test/SemaCXX/overloaded-operator-decl.cpp`.
### Tests
- **Added:**
- **clang/test/AST/ast-dump-static-operators.cpp**
Verify the AST generated for static operators.
- **clang/test/SemaCXX/cxx2b-static-operator.cpp**
Static operators should be able to be called on const / volatile
objects.
- **Modified:**
- **clang/test/CodeGenCXX/cxx2b-static-call-operator.cpp**
- **clang/test/CodeGenCXX/cxx2b-static-subscript-operator.cpp**
Matching the new CodeGen.
### Documentation
- **clang/docs/ReleaseNotes.rst**
Update release notes.
---------
Co-authored-by: Shafik Yaghmour <shafik@users.noreply.github.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
When this option is passed to clang, external (and/or weak) symbols
are not assumed to have the minimum ABI alignment normally required.
Symbols defined locally that are not weak are however still given the
minimum alignment.
This is implemented by passing a new parameter to getMinGlobalAlign()
named HasNonWeakDef that is used to return the right alignment value.
This is needed when external symbols created from a linker script may
not get the ABI minimum alignment and must therefore be treated as
unaligned by the compiler.
Implements https://isocpp.org/files/papers/P2662R3.pdf
The feature is exposed as an extension in older language modes.
Mangling is not yet supported and that is something we will have to do before release.
GCC supports -mtls-dialect= for several architectures to select TLSDESC.
This patch supports the following values
* x86: "gnu". "gnu2" (TLSDESC) is not supported yet.
* RISC-V: "trad" (general dynamic), "desc" (TLSDESC, see #66915)
AArch64 toolchains seem to support TLSDESC from the beginning, and the
general dynamic model has poor support. Nobody seems to use the option
-mtls-dialect= at all, so we don't bother with it.
There also seems very little interest in AArch32's TLSDESC support.
TLSDESC does not change IR, but affects object file generation. Without
a backend option the option is a no-op for in-process ThinLTO.
There seems no motivation to have fine-grained control mixing trad/desc
for TLS, so we just pass -mllvm, and don't bother with a modules flag
metadata or function attribute.
Co-authored-by: Paul Kirth <paulkirth@google.com>