f9c2a341b9
causes regressions when we have a slice with integer vector type that is
the same size as the partition, and a ptr load/store slice that is not
the size of the element type.
Ref `vector-promotion.ll:ptrLoadStoreTys`.
Before the patch, we would only consider `<4 x i32>` as a candidate type
for vector promotion, and would find that it is a viable type for all
the slices.
After the patch, we now add `<2 x ptr>` as a candidate type due to slice
with user `store ptr %val0, ptr %obj, align 8` -- and flag that we
`HaveVecPtrTy`. The pre-existing behavior of this flag results in
removing the viable `<4 x i32>` and keeping only the unviable `<2 x
ptr>`, which results in a failure to promote.
The end result is failing to promote an alloca that was previously
promoted -- this does not appear to be the intent of that patch, which
has the goal of increasing promotions by providing more promotion
opportunities.
This PR preserves this behavior via a simple reorganization of the
implemention: try first the slice types with same size as the partition,
then, if there is no promotable type, try the `LoadStoreTys.`
Refer to RISCV, we will fix up the alignment if linker relaxation
changes code size and breaks alignment. Insert enough Nops and emit
R_LARCH_ALIGN relocation type so that linker could satisfy the alignment
by removing Nops.
It does so only in sections with the SHF_EXECINSTR flag.
In LoongArch psABI v2.30, R_LARCH_ALIGN requires symbol index. The
lowest 8 bits of addend represent alignment and the other bits of addend
represent the maximum number of bytes to emit.
The libclang python binding test CI job currently doesn't have any
restrictions on what branches it will run on when something is pushed
and also isn't restricted to the monorepo. This patch adds a branch
restriction for the push event, only running the CI job when something
is pushed to the main branch (and the path filter is met), and also adds
a filter to ensure that the job comes from a PR against the monorepo or
a push to a branch in the monorepo.
Now that the work embedding PGO information in SHT_LLVM_BB_ADDR_MAP ELF
sections has landed, there is no longer a need to keep around the
mbb-profile-dump flag.
This patch adds basic TLSDESC support in the RISC-V backend.
Specifically, we add new relocation types for TLSDESC, as prescribed in
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373, and add a
new pseudo instruction to simplify code generation.
This patch does not try to optimize the local dynamic case, which can be
improved in separate patches.
Linker side changes will also be handled separately.
The current implementation is only enabled when passing the new
`-enable-tlsdesc` codegen flag.
As I worked through changes to another PR
(https://github.com/llvm/llvm-project/pull/74912), I couldn't help but
rewrite a few methods for readability, maintainability, and possibly
some behavior correctness too.
1. Exiting early instead of nested `if`-statements, which:
- Reduces indentation levels for all subsequent lines
- Treats missing pre-conditions similar to an error
- Clearly indicates that the full length of the method is the "happy
path".
2. Explicitly return empty Value Object shared pointers for those error
(like) situations, which
- Reduces the time it takes a maintainer to figure out what the method
actually returns based on those conditions.
3. Converting a mix of `if` and `if`-`else`-statements around an enum
into one `switch` statement, which:
- Consolidates the former branching logic
- Lets the compiler warn you of a (future) missing enum case
- This one may actually change behavior slightly, because what was an
early test for one enum case, now happens later on in the `switch`.
4. Consolidating near-identical, "copy-pasta" logic into one place,
which:
- Separates the common code to the diverging paths.
- Highlights the differences between the code paths.
rdar://119833526
This is a high level description and FAQ for what we're doing in
RemoveDIs, and how old code should be behave with new debug-info
(exactly the same 99% of the time).
Here's a raft of minor fixes for the RemoveDIs project that's replacing
dbg.value intrinsics with DPValue objects, all IMO trivial:
* When inserting functions or blocks and calling setIsNewDbgInfoFormat,
do that after setting the Parent pointer, just in case conversion from
(or to) dbg.value mode is triggered.
* When transferring DPValues from an empty range in a splice call, don't
transfer if there are no DPValues attached to the source block at all.
* stripNonLineTableDebugInfo should drop DPValues.
* In insertBefore, don't try to transfer DPValues if there aren't any.
Many machine-learning applications (and most software written at AMD)
expect the operation that truncates floats to 8-bit floats to be
saturatinng. That is, they expect `truncf 256.0 : f32 to f8E4M3FNUZ` to
yield `240.0`, not `NaN`, and similarly for negative numbers. However,
the underlying hardware instruction that can be used for this truncation
implements overflow-to-NaN semantics.
To enable handling this usecase, we add the saturate-fp8-truncf option
to ArithToAMDGPU (off by default), which causes the requisite clamping
code to be emitted. Said clamping code ensures that Inf and NaN are
passed through exactly (and thus trancate to NaN).
Per review feedback, this commit efactors
createScalarOrSplatConstant() to the Arith dialect utilities and uses
it in this code. It also fixes naming of existing patterns and
switches from vector.extractelement/insertelement to
vector.extract/insert.
SANITIZER_WEAK_IMPORT is useful for any call that needs to be
conditionally linked in. This is currently used for the
tsan_dispatch_interceptors, but can be used for other calls introduced
in newer versions of MacOS. (such as `aligned_alloc` in this PR
https://github.com/llvm/llvm-project/pull/79198).
This PR moves the definition to a higher level so it can be used in
other sanitizers.
We can directly call `clang -c -x cl -target amdgcn -mcpu=gfx90a test.cl
-o test.o`
to compile an OpenCL kernel file. However, when `--save-temps` is
enabled, it doesn't
work because the preprocessed file (`.i` file) is taken as C source file
when it
is fed to the front end, thus causing compilation error because those
OpenCL keywords
can't be recognized. This patch fixes the issue.
We were previously allowlisting awaitable types returned by
`await_transform` instead of the type of the operand of the `co_await`
expression.
This previously used to give false positives and not respect the
`AllowedAwaitablesList` flag when `await_transform` is used. See added
test cases for such examples.
Currently, the UnifiedLTO pipeline seems to have trouble with several
LTO features, like SplitLTO units, which means we cannot use important
optimizations like Whole Program Devirtualization or security hardening
instrumentation like CFI.
This patch reverts FatLTO to using distinct pipelines for Full LTO and
ThinLTO. It still avoids module cloning, since that was error prone.
This patch adds support for translating dense_resource attributes to
LLVMIR Target.
The support added is similar to how DenseElementsAttr is handled, except
we
don't need to handle splats.
Another possible way of doing this is adding iteration on
dense_resource, but that is
non-trivial as DenseResourceAttr is not meant to be something you should
directly
access. It has subclasses which you are supposed to use to iterate on
it.
`--function=<regex>` Include functions matching regex in the output
`--no-function=<regex>` Exclude functions matching regex from the output
If both are specified, `--no-function` has a higher precedence if a
function name matches both filters
Use the LazyDetector also for the remaining toolchains to avoid unnecessarily checking for the Cuda and Rocm installations.
This fixes rdar://121397534.
There are matching store opcodes (stfd, stxsiwx) for the load opcodes
that make 32-bit and 64-bit vector operations cheap with VSX, so stores
should also be cheap.
The code cleanup in #74794 accidentally broke detection of languages by
reading file content from stdin, e.g. via `clang-format -dump-config - <
/path/to/filename`.
This PR adds unit and integration tests to reproduce the issue and adds
a fix.
Fixes: #79023
Port COFF's https://reviews.llvm.org/D78221 and
https://reviews.llvm.org/D137217 to ELF. For the in-process ThinLTO
link, `ld.lld --save-temps a.o d/b.o -o out` will create
ELF relocatable files `out.lto.a.o`/`d/out.lto.b.o` instead of
`out1.lto.o`/`out2.lto.o`. Deriving the LTO-generated relocatable file
name from bitcode file names helps debugging.
The relocatable file name from the first regular LTO partition does not
change: `out.lto.o`. The second, if present due to `--lto-partition=`,
changes from `out1.lto.o` to `lto.1.o`.
For an archive member, e.g. `d/a.a(coll.o at 8)`,
the relocatable file is `d/out.lto.a.a(coll.o at 8).o`.
`--lto-emit-asm` file names are changed similarly. `--lto-emit-asm -o
out` now creates `out.lto.s` instead of `out`, therefore the
`--lto-emit-asm -o -` idiom no longer works. However, I think this new
behavior (which matches COFF) is better since keeping or removing
`--lto-emit-asm` will dump different files, instead of overwriting the
`-o` output file from an executable/shared object to an assembly file.
Reviewers: rnk, igorkudrin, xur-llvm, teresajohnson, ZequanWu
Reviewed By: teresajohnson
Pull Request: https://github.com/llvm/llvm-project/pull/78835
This is the first step towards an alternate shuffle lowering design for
the general two vector argument case. The goal is to leverage the
existing lowering for single vector permutes to avoid as many of the
vrgathers as required - even if we do need the other.
This patch handles only the first argument, and is arguably a slightly
weird half-step. However, the test changes from the full two argument
recurse patch are a lot harder to reason about. Taking this half step
gives much more easily reviewable changes, and is thus worthwhile. I
intend to post the patch for the second argument once this has landed.
When marking symbols as having their address taken, we can have the
sitaution where we have the address taken of a weak symbol. If there's
no strong definition of the symbol, the symbol ends up as an absolute
symbol with the value null. In those cases, we don't have any Chunk.
Skip such symbols from the cfguard tables.
This fixes https://github.com/llvm/llvm-project/issues/78619.
If we have a shuffle which is larger than m1, we may be able to split it
into a series of individual m1 shuffles. This patch starts with the
subcase where the mask allows a 1-to-1 mapping from source register to
destination register - each with a possible permutation of their own. We
can potentially extend this later, thought in practice this seems to
already catch a number of the most interesting cases.
Originally merged here: https://github.com/llvm/llvm-project/pull/75882
Reverted here: https://github.com/llvm/llvm-project/pull/78627
Reverted due to failing buildbots. The problem was not caused by the
annotations code, but by code in the `UniqueFunctionBase` class and in
the `JSON.h` file. That code caused the program to write to memory that
was already being used by string objects, which resulted in an ASan
error.
Fixes are implemented in:
- https://github.com/llvm/llvm-project/pull/79065
- https://github.com/llvm/llvm-project/pull/79066
Problematic code from `UniqueFunctionBase` for example:
```cpp
#ifndef NDEBUG
// In debug builds, we also scribble across the rest of the storage.
memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
#endif
```
---
Original description:
This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).
Originally suggested here: https://reviews.llvm.org/D147680
String annotations added here:
https://github.com/llvm/llvm-project/pull/72677
Requires to pass CI without fails:
- https://github.com/llvm/llvm-project/pull/75845
- https://github.com/llvm/llvm-project/pull/75858
Annotating `std::basic_string` with default allocator is implemented in
https://github.com/llvm/llvm-project/pull/72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.
Support in ASan API exists since
dd1b7b797a.
You can turn off annotations for a specific allocator based on changes
from
2fa1bec7a2.
This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.
The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.
If you have any questions, please email:
advenam.tacet@trailofbits.comdisconnect3d@trailofbits.com