This reverts commit 2b129dacdde667137b5012d52f1d96e0ab26c749.
parseSymbolVersion can be combined with computeIsPreemptible,
making hasVersionSyms unneeded.
This reapplies 570ecdcf8b4, which was reverted in 6073dd923b8 due to bot
failures.
The test failures on Linux were fixed by:
1. Removing an overly restrictive assertion (query dependence on a symbol no
longer implies a MaterializingInfo for that symbol)
2. Adding reentry and resolver files to the ORC runtime CMakeLists.txt for
Linux.
3. Adding the __orc_rt_reentry -> __orc_rt_sysv_reentry alias to ELFNixPlatform.
This calls into the existing constant folding for `llvm.sin` and
`llvm.cos`, which currently does not fold for any non-finite values, so
most tests are negative tests at the moment.
Note: The constant folding does not consider the `afn` fast-math flag
and will produce the same result regardless of if the flag is set.
We can just scan objectFiles and sharedFiles that have versioned symbols
to skip scanning the global symtab. While we won't suggest __wrap_foo
for undefined __wrap_foo@v1 when --wrap=foo@v1 is specified
(internalFile isn't scanned), this edge case difference is acceptable.
This patch implements the following intrinsics:
8-bit floating-point convert to half-precision or BFloat16 (in-order).
``` c
// Variant is also available for: _bf16[_mf8]_x2
svfloat16x2_t svcvt1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
svfloat16x2_t svcvt2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
```
In accordance with https://github.com/ARM-software/acle/pull/323.
Co-authored-by: Marin Lukac marian.lukac@arm.com
Co-authored-by: Caroline Concatto caroline.concatto@arm.com
It is possible that the number of hidden arguments that are selected to
be preloaded in AMDGPULowerKernel arguments and isel can differ. This
isn't an issue with explicit arguments since isel can lower the argument
correctly either way, but with hidden arguments we may have alignment
issues if we try to load these hidden arguments that were added to the
kernel signature.
The reason for the mismatch is that isel reserves an extra synthetic
user SGPR for module LDS.
Instead of teaching lowerFormalArguments how to handle these properly it
makes more sense and is less expensive to fix the mismatch and assert if
we ever run into this issue again. We should never be trying to lower
these in the normal way.
In a future change we probably want to revise how we track "synthetic"
user SGPRs and unify the handling in GCNUserSGPRUsageInfo. Sometimes
synthetic SGPRSs are considered user SGPRs and sometimes they are not.
Until then this patch resolves the inconsistency, fixes the bug, and is
otherwise a NFC.
In certain cases (i.e. long double on x86), the bit with we get from the
floating point semantics is different than the type size we compute for
the BitCast instruction. Pass this along to DoBitCast, so in there we
can check only the relevant bits for being initialized.
This also fixes a weirdness we still had in DoBitCast.
When computing whether a defined symbol is exported, we set
`exportDynamic` in Defined and CommonSymbol's ctor and merge the bit in
symbol resolution. The complexity is for the LTO special case
canBeOmittedFromSymbolTable, which can be simplified by introducing a
new bit.
We might simplify the state by caching includeInDynsym in exportDynamic
in the future.
When we call IndexedMemProfRecord::toMemProfRecord, we care about
getting the original (that is, non-indexed) MemProfRecord back, so we
should just verify that, not the hash values, which are
intermediaries.
There is a remote possibility of hash collisions where call stack
{F1, F2} might come back as {F1, F1} if F1.hash() == F2.hash() for
example. However, since FrameId uses BLAKE, the hash values should be
consistent across architectures. That is, if this test case works on
one architecture, it should work on others as well.
This PR adds translation support for task detach. Essentially, if the
`detach` clause is present on a task, emit a
`__kmpc_task_allow_completion_event` on it, and store its return (of
type `kmp_event_t*`) into the `event_handle`.
Since the sanitizer merge in commit r15-5164-gfa321004f3f628 of GCC
which entails LLVM commit 61a6439f35b6de28ff4aff4450d6fca970292fd5, GCCs
bootstrap is broken on s390 -m31. This is due to commit
ec68dc1ca4d967b599f1202855917d5ec9cae52f which introduces stricter type
checking which is why GCC bootstrap fails with
```
In file included from /gcc/src/libsanitizer/interception/interception.h:18,
from /gcc/src/libsanitizer/interception/interception_type_test.cpp:14:
/gcc/src/libsanitizer/interception/interception_type_test.cpp:30:61: error: static assertion failed
30 | COMPILER_CHECK((__sanitizer::is_same<::SSIZE_T, ::ssize_t>::value));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/gcc/src/libsanitizer/sanitizer_common/sanitizer_internal_defs.h:363:44: note: in definition of macro 'COMPILER_CHECK'
363 | #define COMPILER_CHECK(pred) static_assert(pred, "")
| ^~~~
make[8]: *** [Makefile:469: interception_type_test.lo] Error 1
```
The culprit seems to be that we don't check for equality of type sizes
anymore but rather whether the types are indeed the same. On s390 -m31
we have that `sizeof(int)==sizeof(long)` holds which is why previously
the checks succeeded. They fail now because
```
size_t => unsigned long
ssize_t => long
ptrdiff_t => int
::SSIZE_T => __sanitizer::sptr => int
::PTRDIFF_T => __sanitizer::sptr => int
```
This is fixed by mapping `SSIZE_T` to `long` in the end.
```
#if defined(__s390__) && !defined(__s390x__)
typedef long ssize;
#else
typedef sptr ssize;
#endif
#define SSIZE_T __sanitizer::ssize
```
This patch adds YAML read/write support to llvm-profdata. The primary
intent is to accommodate MemProf profiles in test cases, thereby
avoiding the binary format.
The read support is via llvm-profdata merge. This is useful when we
want to verify that the compiler does the right thing on a given .ll
file and a MemProf profile in a test case. In the test case, we would
convert the MemProf profile in YAML to an indexed profile and invoke
the compiler on the .ll file along with the indexed profile.
The write support is via llvm-profdata show --memory. This is useful
when we wish to convert an indexed MemProf profile to YAML while
writing tests. We would compile a test case in C++, run it for an
indexed MemProf profile, and then convert it to the text format.
For some targets uptr is mapped to unsigned int and size_t to unsigned
long and sizeof(int)==sizeof(long) holds. Still, these are distinct
types and type checking may fail. Therefore, replace uptr by
usize/SIZE_T wherever a size_t is expected.
Part of #116957
This avoids the need to dynamically relocate each pointer in the table.
To make this work, this PR also moves the binary search of intrinsic
names to an internal function with an adjusted signature, and switches
the unittesting to test against actual intrinsics.
This fix the case, when single hot inlined callsite, prevent
checks for all other. This helps to reduce number of removed checks up
to 50% (deppedes on `cutoff-hot` value) .
`ScalarOptimizerLateEPCallback` was happening during
CGSCC walk, after each inlining, but this is effectively
after inlining.
Example, order in comments:
```
static void overflow() {
// 1. Inline get/set if possible
// 2. Simplify
// 3. LowerAllowCheckPass
set(get() + get());
}
void test() {
// 4. Inline
// 5. Nothing for LowerAllowCheckPass
overflow();
}
```
With this patch it will look like:
```
static void overflow() {
// 1. Inline get/set if possible
// 2. Simplify
set(get() + get());
}
void test() {
// 3. Inline
// 4. Simplify
overflow();
}
// Later, after inliner CGSCC walk complete:
// 5. LowerAllowCheckPass for `overflow`
// 6. LowerAllowCheckPass for `test`
```