Fix issues in list.rst, addapt add_new_check.py to new
format of that file, and run gen-static-analyzer-docs.py
to generate missing documentation for clang-analyzer.
Before this change, when `file.profdata` have vtable profiles but `--enable-vtable-value-profiling` is not on for optimized build, warnings from this line [1] will show up. They are benign for performance but confusing.
It's better to automatically annotate vtable profiles if `file.profdata` has them. This PR implements it in profile use pass.
* If `-icp-max-num-vtables` is zero (default value is 6), vtable profiles won't be annotated.
[1] 464d321ee8/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1762-L1768)
As in title, code which checks eligibility of exceptions with pointer
types to be handled by exception handler of type `void *` disallowed
this case. It was working like this:
```c++
if (isStandardPointerConvertible(ExceptionCanTy, HandlerCanTy) &&
isUnambiguousPublicBaseClass(
ExceptionCanTy->getTypePtr()->getPointeeType().getTypePtr(),
HandlerCanTy->getTypePtr()->getPointeeType().getTypePtr())) {
```
but in `isUnambiguousPublicBaseClass` there was code which looked for
definitions:
```c++
bool isUnambiguousPublicBaseClass(const Type *DerivedType,
const Type *BaseType) {
const auto *DerivedClass =
DerivedType->getCanonicalTypeUnqualified()->getAsCXXRecordDecl();
const auto *BaseClass =
BaseType->getCanonicalTypeUnqualified()->getAsCXXRecordDecl();
if (!DerivedClass || !BaseClass)
return false;
```
This code disallowed usage of `void *` type which was already correctly
detected in `isStandardPointerConvertible`.
AFAIK this seems like misinterpretation of specification:
> 14.4 Handling an exception
> a standard [pointer conversion](https://eel.is/c++draft/conv.ptr) not
involving conversions to pointers to private or protected or ambiguous
classes
(https://eel.is/c++draft/except.handle#3.3.1)
and
> 7.3.12 Pointer conversions
> ... If B is an inaccessible
([[class.access]](https://eel.is/c++draft/class.access)) or ambiguous
([[class.member.lookup]](https://eel.is/c++draft/class.member.lookup))
base class of D, a program that necessitates this conversion is
ill-formed[.](https://eel.is/c++draft/conv.ptr#3.sentence-2) ...
(https://eel.is/c++draft/conv.ptr#3)
14.4 is carving out private, protected, and ambiguous base classes, but
they are already carved out in 7.3.12 and implemented in
`isStandardPointerConvertible`
---------
Co-authored-by: Piotr Zegar <me@piotrzegar.pl>
On linux, the start address doesn't necessarily have a symbol attached
to it.
This is why this patch replaces `DynamicLoader::GetStartSymbol` with
`DynamicLoader::GetStartAddress` instead to make it more generic.
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
The regular expressions match functions that aren't anchored in the
global namespace. For example `::remove` matches any object with a
`removeXyz` method. This change is to remove these false positives
[There is
work](https://discourse.llvm.org/t/rfc-upgrading-llvms-minimum-required-python-version/67571)
to make Python 3.8 the minimum Python version for LLVM.
I edited this script because I wanted some indicator of progress while
going through files.
It now outputs `[XX/YYY]` with the number of processed and total files
after each completion.
The current version of this script is compatible downto Python 3.6 (this
is PyYAML's minimum version).
It would probably work with older Python 3 versions with an older PyYAML
or when YAML is disabled.
With the updates here, it is compatible downto Python 3.7. Python 3.7
was released June 2018.
https://github.com/llvm/llvm-project/pull/89302 is also touching this
file, I don't mind rebasing on top of that work if needed.
### Summary
- Add type annotations.
- Replace `threading` + `queue` with `asyncio`.
- **Add indicator of processed files over total files**. This is what I
set out to do initially.
- Only print the filename after completion, not the entire Clang-Tidy
invocation command. I find this neater but the behavior can easily be
restored.
The second argument passed to these builtins is used to validate whether
the object's alignment is sufficient for atomic operations of the given
size.
Currently, the builtins can be folded at compile time only when the
argument is 0/nullptr, or if the _type_ of the pointer guarantees
appropriate alignment.
This change allows the compiler to also evaluate non-null constant
pointers, which enables callers to check a specified alignment, instead
of only the type or an exact object. E.g.:
`__atomic_is_lock_free(sizeof(T), (void*)4)`
can be potentially evaluated to true at compile time, instead of
generating a libcall. This is also supported by GCC, and used by
libstdc++, and is also useful for libc++'s atomic_ref.
Also helps with (but doesn't fix) issue #75081.
This also fixes a crash bug, when the second argument was a non-pointer
implicitly convertible to a pointer (such as an array, or a function).
When `pauthtest` is either passed as environment part of AArch64 Linux
triple
or passed via `-mabi=`, enable the following ptrauth flags:
- `intrinsics`;
- `calls`;
- `returns`;
- `auth-traps`;
- `vtable-pointer-address-discrimination`;
- `vtable-pointer-type-discrimination`;
- `init-fini`.
Some related stuff is still subject to change, and the ABI itself might
be changed, so end users are not expected to use this and the ABI name
has 'test' suffix.
If `-mabi=pauthtest` option is used, it's normalized to effective
triple.
When the environment part of the effective triple is `pauthtest`, try
to use `aarch64-linux-pauthtest` as multilib directory.
The following is not supported:
- combination of `pauthtest` ABI with any branch protection scheme
except BTI;
- explicit set of environment part of the triple to a value different
from `pauthtest` in combination with `-mabi=pauthtest`;
- usage on non-Linux OS.
---------
Co-authored-by: Anatoly Trosinenko <atrosinenko@accesssoftek.com>
CoroSplit expects the second parameter of `llvm.coro.id` to be the
promise alloca. Applying Asan on a pre-split coroutine breaks this
assumption and causes split to fail. This should be NFC because asan
pass happens late in the pipeline where all coroutines are split. This
is to prevent crash in case the order of passes are switched.
Also `NoopCoro.Frame.Const` is a special coroutine frame that does
nothing when resumed or destroyed. There is no point to do
instrumentation on it.
The `ExternalNameConversion` will add an _ at the end of the external
functions. We extract the real function name to use in the debug info.
The convention is to use the real name of function in the `name` field
and mangled name with extra _ at the end in the `linkageName` field.
Fixes#92391.
Compared to the generic scalar code, using Arm NEON instructions yields
a ~11x speedup: 31 vs 339.5 ms to hash 1 GiB of random data on the Apple
M1.
This follows the upstream implementation closely, with some
simplifications made:
- Removed workarounds for suboptimal codegen on older GCC
- Removed instruction reordering barriers which seem to have a
negligible impact according to my measurements
- We do not support WebAssembly's mostly NEON-compatible API
- There is no configurable mixing of SIMD and scalar code; according to
the upstream comments, this is only relevant for smaller Cortex cores
which can dispatch relatively few NEON micro-ops per cycle.
This commit intends to use only standard ACLE intrinsics and datatypes,
so it should build with all supported versions of GCC, Clang and MSVC.
This feature is enabled by default when targeting AArch64, but the
`LLVM_XXH_USE_NEON=0` macro can be set to explicitly disable it.
XXH3 is used for ICF, string deduplication and computing the UUID in
ld64.lld; this commit results in a -1.77% +/- 0.59% speed improvement
for a `--threads=8` link of Chromium.framework.
We need to use the minimum size of the scalable type and the correct
stack ID.
The code in the PR is still invalid because the instruction used doesn't
have a pointer operand. This is diagnosed later when the assembler
parses it.
Fixes#99782
This patch removes the recursion in fcntl introduced by PR #99675 as it is not required and may be dangerous in some cases: some toolchains define F_GETLK == F_GETLK64 causing infinite recursion.
## Introduction
This patch implements LWG3618: Unnecessary `iter_move` for
`transform_view::iterator`.
`transform_view`'s iterator currently specifies a customization point
for `iter_move`. This customization point does the same thing that the
default implementation would do, but its sole purpose is to ensure the
appropriate conditional `noexcept` specification.
## Reference
-
[[range.transform.iterator]](https://eel.is/c++draft/range.transform.iterator)
- [LWG3618](https://cplusplus.github.io/LWG/issue3618)
Summary:
This value is a uint32_t but is printed as a uint64_t, leading to
invalid offsets when done on AMDGPU due to its packed format extending
past the buffer.
In comparison to non-instrumented binaries, PGO instrumentation binaries
can be significantly slower. For highly threaded programs, this slowdown
can
reach 10x due to data races or false sharing within counters.
This patch incorporates sampling into the PGO instrumentation process to
enhance the speed of instrumentation binaries. The fundamental concept
is similar to the one proposed in https://reviews.llvm.org/D63949.
Three sampling modes are introduced:
1. Simple Sampling: When '-sampled-instr-bust-duration' is set to 1.
2. Fast Burst Sampling: When not using simple sampling, and
'-sampled-instr-period' is set to 65535. This is the default mode of
sampling.
3. Full Burst Sampling: When neither simple nor fast burst sampling is
used.
Utilizing this sampled instrumentation significantly improves the
binary's
execution speed. Measurements show up to 5x speedup with default
settings. Fast burst sampling now results in only around 20% to 30%
slowdown (compared to 8 to 10x slowdown without sampling).
Out tests show that profile quality remains good with sampling,
with edge counts typically showing more than 90% overlap.
For applications whose behavior changes due to binary speed,
sampling instrumentation can enhance performance.
Observations have shown some apps experiencing up to
a ~2% improvement in PGO.
A potential drawback of this patch is the increased binary size
and compilation time. The Sampling method in this patch does
not improve single threaded program instrumentation binary
speed.
Upstream the intrinsics `llvm.amdgcn.raw.atomic.buffer.load`
and `llvm.amdgcn.raw.atomic.ptr.buffer.load`.
These additional intrinsics mark atomic buffer loads
as atomic to LLVM by removing the `IntrReadMem`
attribute. Otherwise, it could hoist these
intrinsics out of loops in cases where LLVM marks
them as invariant. That can cause issues such as
infinite loops.
Continuation of https://reviews.llvm.org/D138786
with the additional use in the fat buffer lowering,
more test cases and the additional ptr versions
of these intrinsics.
---------
Co-authored-by: rtayl <>
Co-authored-by: Jay Foad <jay.foad@amd.com>
Co-authored-by: Mariusz Sikora <mariusz.sikora@amd.com>
Add `createLocalSymbol` to create a local, non-temporary symbol.
Different from `createRenamableSymbol`, the `Used` bit is ignored,
therefore multiple local symbols might share the same name.
Utilizing `createLocalSymbol` in AArch64 allows for efficient mapping
symbol creation with non-unique names, saving .strtab space.
The behavior matches GNU assembler.
Pull Request: https://github.com/llvm/llvm-project/pull/99836
This patch handles dependencies specified by the `depend` clause on an
OpenMP target construct. It does this much the same way clang does it by
materializing an OpenMP `task` that is tagged with the dependencies.
The following functions are relevant to this patch -
1) `createTarget` - This function itself is largely unchanged except
that it now accepts a vector of `DependData` objects that it simply
forwards to `emitTargetCall`
2) `emitTargetCall` - This function has changed now to check if an outer
target-task needs to be materialized (i.e if `target` construct has
`nowait` or has `depend` clause). If yes, it calls `emitTargetTask` to
do all the heavy lifting for creating and dispatching the task.
3) `emitTargetTask` - Bulk of the change is here. See the large comment
explaining what it does at the beginning of this function
Makefile-based builds did not have proper dependencies to built the
FortranRuntime target after Flang new is available. This PR introduces a
dependency to ensure that this is the case. Relates to PR #95388.
---------
Co-authored-by: Michael Kruse <github@meinersbur.de>
FileCheck has special handline for the `-EMPTY` suffix, that should
match empty lines. Overloading the suffix can be a source of confusion
when reading tests. Additionally, the current implementation seems to
match the following expressions, which appears to be a bug in FileCheck.
Just like for regular IR we need to treat SELECT as conditionally
blocking poison in SelectionDAG. So (unless the condition itself is
poison) the result is only poison if the selected true/false value is
poison.
Thus, when doing DAG combines that turn SELECT into arithmetic/logical
operations (e.g. AND/OR) we need to make sure that the new operations
aren't more poisonous. One way to do that is to use FREEZE to make
sure the operands aren't posion.
This patch aims at fixing the kind of miscompiles reported in
https://github.com/llvm/llvm-project/issues/84653
and
https://github.com/llvm/llvm-project/issues/85190
Solution is to make sure that we insert FREEZE, if needed to make
the fold sound, when using the foldBoolSelectToLogic and
foldVSelectToSignBitSplatMask DAG combines.
In some systems like rv32, only fcntl64 is available and it employs a different structure for file locking and the correspoding F_GETLK64, F_SETLK64, and F_SETLKW64 commands.
So if we use fcntl64, the F_GETLK, F_SETLK, and F_SETLKW commands need to be changed to their 64 versions. This patch adds new cases to the swich(cmd) in our implementation of fcntl to do that.
The default case was moved to outside the switch, so we don't need to change anything, the F_GETLK, F_SETLK, and F_SETLKW commands will just go through the old implementation.
In 32-bit systems with 64-bit offsets, both fsfilcnt_t and fsblkcnt_t are 64-bit long, just like 64-bit systems. This patch changes both types to be 64-bit long for all platforms and follows the reasoning used to change off_t: the standard only requires it to be an unsigned int, so making it 64-bit long doesn't violate this property.
It should be NFC for 64-bit systems.
Don't use `copyHostAssociateVar` for allocatable variables. It isn't
clear to me whether or not this should be addressed in
`copyHostAssociateVar` instead of inside OpenMP. I opted for OpenMP
to minimise how many things I effected. `copyHostAssociateVar` will
not update the destination variable if the destination variable
was unallocated. This is incorrect because assignment inside of the
openmp block can cause the allocation status of the variable to
change. Furthermore, `copyHostAssociateVar` seems to only copy the
variable address not other metadata like the size of the allocation.
Reallocation by assignment could cause this to change.