This PR XFAILS `llvm/test/DebugInfo/attr-btf_type_tag.ll` on AIX since
we we don’t have `.debug_addr` section.
Co-authored-by: Nivetha Kuruparan <nivetha@comp810.rtp.raleigh.ibm.com>
There are multiple ways to define a remainder operation. Depending on
the definition, the result could be either always positive or have the
sign of the dividend.
The pattern lowering `arith.remf` to LLVM assumes that the semantics
match `llvm.frem`, which seems to be reasonable. The folder, however, is
implemented via `APFloat::remainder()` which has different semantics.
This patch matches the folding behaviour to lowering behavior by using
`APFloat::mod()`, which matches the behavior of `llvm.frem` and libm's
`fmod()`. It also updates the documentation of `arith.remf` to explain
this behavior: The sign of the result of the remainder operation always
matches the sign of the dividend (LHS operand).
frem documentation: https://llvm.org/docs/LangRef.html#frem-instruction
Fix https://github.com/llvm/llvm-project/issues/94431
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
This PR attempts to fix common block mapping for regular mapping of
these types as well as when they have been marked as "declare target
link". This PR should allow correct mapping of both the members of a
common block and the full common block via its block symbol.
The main changes were some adjustments to the Fortran OpenMP lowering to
HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and
adjustments to the way the we handle target kernel map argument
rebinding inside of the OMPIRBuilder.
For the Fortran OpenMP lowering were two changes, one to prevent the
implicit capture of common block members when the common block symbol
itself has been marked and the other creates intermediate member access
inside of the target region to be used in-place of those external to the
target region, this prevents external usages breaking the
IsolatedFromAbove pact.
In the latter case, there was an adjustment to the size calculation for
types to better handle cases where we pass an array as the type of a map
(as opposed to the bounds and the type of the element), which occurs in
the case of common blocks. There is also some adjustment to how
handleDeclareTargetMapVar handles renaming of declare target symbols in
the module to the reference pointer, now it will only apply to those
within the kernel that is currently being generated and we also perform
a modification to replace constants with instructions as necessary as we
cannot replace these with our reference pointer (non-constant and
constants do not mix nicely).
In the case of the OpenMPIRBuilder some changes were made to defer
global symbol rebinding to kernel arguments until all other arguments
have been rebound. This makes sure we do not replace uses that may refer
to the global (e.g. a GEP) but are themselves actually a separate
argument that needs bound.
Currently "declare target to" still needs some work, but this may be the
case for all types in conjunction with "declare target to" at the
moment.
Found while skimming this code. Don't have a reproducible test case for
this but the nullptr check should clearly occur before we try to
dereference `location_sp`.
This paper was a rewording of WG14 N1485, correcting terminology and
bringing the C11 feature slightly closer in line with the C++11
feature. There is nothing additional to be done or test to conform to
what was specified by WG14 N1537, so we'll remove the entry and lean on
N1485 to track status for atomics.
This PR adds -mtune as a valid flang flag and passes the information
through to LLVM IR as an attribute on all functions. No specific
architecture optimizations are added at this time.
Reland: https://github.com/llvm/llvm-project/pull/95456
This patch improves the ROCDL gpu serialization API by:
- Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN
device code libraries to use during linking.
- Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`.
Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode
librariesm now it will only load the requested libraries.
- Exposing the `compileToBinary` method and making it virtual, allowing
downstream users to re-use this method.
- Exposing `moduleToObjectImpl`, this method provides a prototype flow
for compiling to binary, allowing downstream users to re-use this
method.
- It also avoids constructing the control variables if no device
libraries are being used.
- Changes the style of the error messages to be composable, ie no full
stops.
- Adds an error message for when the ROCm toolkit can't be found but it
was required.
r7 is reserved in thumb2 (typically for the frame pointer, as opposed to r11 in
ARM mode), so assigning to a variable with explicit register storage in r7 will
produce an error.
But r7 is where the Linux kernel expects the syscall number to be placed. We
can use a temporary to get the register allocator to pick a temporary, which we
save+restore the previous value of r7 in.
Fixes: #93738
According to https://reviews.llvm.org/D114250
this was to handle Mac specific issue, however
the test is Linux only.
The test effectively prevents to lock main allocator
on fork, but we do that on Linux for other
sanitizers for years, and need to do the same
for TSAN to avoid deadlocks.
We use REAL() calls in interceptors, but
DEFINE_REAL_PTHREAD_FUNCTIONS has nothing to do
with them and only used for internal maintenance
threads.
This is done to avoid confusion like in #96456.
N2843 was subsumed by N2882; we could probably consider removing
subsumed entries, but I've been leaving them to help folks looking at
the editor's report from various working drafts and wondering about the
changes.
The `getDroppedDims` utility function does not follow the convention of
dropping outermost unit dimensions first when inferring a rank reduction
mask for a slice. This PR updates the implementation to match this
convention.
Many state of the art models and quantization operations are now
directly working on vector.contract on integers.
This commit enables generalizes ext-contraction folding S.T we can emit
more performant vector.contracts on codegen pipelines.
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Alastair Robertson reported a huge compilation time increase without -g
for bpf target when comparing to x86 ([1]). In my setup, with '-O0', for
x86, a large basic block compilation takes 0.19s while bpf target takes
2.46s. The top function which contributes to the compile time is
eliminateFrameIndex().
Such long compilation time without -g is caused by commit
05de2e481811 ("[bpf] error when BPF stack size exceeds 512 bytes")
The compiler tries to get some debug loc by iterating all insns in the
basic block which will be used when compiler warns larger-than-512 stack
size. Even without -g, such iterating also happens which cause
unnecessary compile time increase.
To fix the issue, let us move the related code when the compiler is
about to warn stack limit violation. This fixed the compile time
regression, and on my system, the compile time is reduced from 2.46s to
0.35s.
[1] https://github.com/bpftrace/bpftrace/issues/3257
Co-authored-by: Yonghong Song <yonghong.song@linux.dev>
New fixes:
- properly init the `std::optional<std::vector>` to an empty vector as
opposed to `{}` (which was effectively `std::nullopt`).
---------
Co-authored-by: Vy Nguyen <oontvoo@users.noreply.github.com>
The builtin causes the program to stop its execution abnormally and
shows a human-readable description of the reason for the termination
when a debugger is attached or in a symbolicated crash log.
The motivation for the builtin is explained in the following RFC:
https://discourse.llvm.org/t/rfc-adding-builtin-verbose-trap-string-literal/75845
clang's CodeGen lowers the builtin to `llvm.trap` and emits debugging
information that represents an artificial inline frame whose name
encodes the category and reason strings passed to the builtin.
The GCC build has gotten to the point where it's often hard to find the
actual error in the build log. We should look into enabling these
warnings again in the future, but it looks like a lot of them are
bogous.
This patch implements an improvement introduced in P3029R1 that was
missed in #87873. It adds a deduction of static extents if
integral_constant-like constants are passed to `std::extents`.
To enable function multi-versioning (FMV), current checks which rely on
cmd line options or global macros to see if target feature is present
need to be removed. This patch removes those for NEON and also
implements changes to NEON header file as proposed in
[ACLE](https://github.com/ARM-software/acle/pull/321).
Sometime tsan runtimes calls, like
`__tsan_mutex_create ()`, need to store a stack
in the StackDepot, and the Depot may need to start
and maintenance thread.
Example:
```
__sanitizer::FutexWait ()
__sanitizer::Semaphore::Wait ()
__sanitizer::Mutex::Lock ()
__tsan::SlotLock ()
__tsan::SlotLocker::SlotLocker ()
__tsan::Acquire ()
__tsan::CallUserSignalHandler ()
__tsan::ProcessPendingSignalsImpl ()
__tsan::ProcessPendingSignals ()
__tsan::ScopedInterceptor::~ScopedInterceptor ()
___interceptor_mmap ()
pthread_create ()
__sanitizer::internal_start_thread ()
__sanitizer::(anonymous namespace)::CompressThread::NewWorkNotify ()
__sanitizer::StackDepotNode::store ()
__sanitizer::StackDepotBase<__sanitizer::StackDepotNode, 1, 20>::Put ()
__tsan::CurrentStackId ()
__tsan::MutexCreate ()
__tsan_mutex_create ()
```
pthread_create() implementation may hit other
interceptors recursively, which may invoke
ProcessPendingSignals, which deadlocks.
Alternative solution could be block interceptors
closer to TSAN runtime API function, like
`__tsan_mutex_create`, or just before
`StackDepotPut``, but it's not needed for most
calls, only when new thread is created using
`real_pthread_create`.
I don't see a reasonable way to create a
regression test.
There is code duplication in all containers that static_assert the
allocator matches the allocator requirements in the spec. This check can
be moved into a more centralised place.
This PR addresses a problem that headers may not be able to be found if
`#include` is used with std modules.
Consider the following file:
#include <boost/json.hpp>
import std;
int main(int, const char **) { }
Boost will include something from libc++, but we are using -nostdinc++
at [1] so the compiler can not find any default std header. Therefore
the locally built header needs to be public.
[1]: 15fdd47c4b/libcxx/modules/CMakeLists.txt.in (L52)