1013fe3c0cfd7582e94ef2d4bfd79da7ea1a1289 used to implement LWG2770, but
cb0d4df97490ec2d2b1cdf7574d26b1bc4063599 made LWG2770 unimplemented
again because of CWG2386.
This patch re-implements LWG2770, while keeping the libc++-specific
implementation strategy (which is controversial as noted in LWG4040).
Drive-by:
- Make the test coverage for the controversial part noted in LWG4040
libc++-only.
- Add the previously missed entry for LWG2770 to the documentation.
In the caching part of the secondary path, when about to try to release
memory to the OS, we always wait while acquiring the lock. However, if
multiple threads are attempting this at the same time, all other threads
will likely do nothing when the release call is made. Change the
algorithm to skip the release if there is another release in process.
Also, pull the lock outside the releaseOlderThan function. This is so
that in the store path, we use the tryLock and skip if another thread is
releasing. But in the path where a forced release call is being made,
that call will wait for release to complete which guarantees that all
entries are released when requested.
The rounding of the result when using an FMA instruction for hyperbolic
sin/cos on float16 was off by 1 bit for a few cases. This patch adds
extra exceptional cases to handle these.
Introduced a cmake option that is disabled by default that suppresses
searching via the PATH variable for a symbolizer. The option will be
enabled for downstream builds where the user will need to specify the
symbolizer path more explicitly, e.g., by using ASAN_SYMBOLIZER_PATH.
Alloca operations were being emitted into the entry block of the current
function unconditionally, even if the variable they represented was
declared in a different scope. This change upstreams the code for
handling
insertion of the alloca into the proper lexcial scope. It also adds a
CIR-to-CIR transformation to hoist allocas to the function entry block,
which is necessary to produce the expected LLVM IR during lowering.
When CIR binop support was commited, it accidentally included a test for
functionality that was removed during the review process
(cir.binop.overflow). This test, of course, fails.
This change removes the failing test. It will be re-added when the
corresponding op is added.
All downstream users are migrated, so we no longer need to produce
"public"/"release" cc_library target for each libc_function macro
invocation. Instead, we only create internal target (for testing), and
some filegroups, which will be picked up by the libc_release_library
invocation.
This allows us to get rid of "weak" argument to libc_function - this
decision is also postponed to libc_release_library configuration.
Fixes#130327.
This patch adds upstreams support for BinOp including lvalue
assignments. Note that this does not include ternary ops,
BinOpOverflowOp, pointer arithmetic, ShiftOp and SelectOp which are
required for logical binary operators.
---------
Co-authored-by: Morris Hafner <mhafner@nvidia.com>
Co-authored-by: Andy Kaylor <akaylor@nvidia.com>
This reverts commit ff3e2ba9eb94217f3ad3525dc18b0c7b684e0abf.
The recommmitted version limits to transform to cases where no
interleaving is taking place, to avoid a mis-compile when interleaving.
Original commit message:
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.
This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.
This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms.
Depends on https://github.com/llvm/llvm-project/pull/106431.
Fixes https://github.com/llvm/llvm-project/issues/82936.
PR: https://github.com/llvm/llvm-project/pull/106441
Previously isAlocatable was updated to allow inheritance from any
superclass for a generated register class, but other properties are
still inherited from its nearest superclass. This could cause
a generated regclass inherit undesired properties, e.g., tsflags, from
an unallocatable superclass due to the topological inheritance order.
This change updates to inherit properties from the nearest allocatable
superclass if possible and includes a test to demonstrate a potential
incorrect inheritance of tsflags.
Improve the modeling of the memory effects and instruction cost of
inline assembly.
- MemoryEffects: The CUDA spec states that inline assembly is not
assumed to have any side-effects or read or write to memory. An inline
assembly may be treated as NoModRef unless it is explictly marked as
having side effects or has an explicit memory clobber.
https://docs.nvidia.com/cuda/inline-ptx-assembly/index.html#incorrect-optimization
> Normally any memory that is written to will be specified as an out
operand, but if there is a hidden read or write on user memory (for
example, indirect access of a memory location via an operand), or if you
want to stop any memory optimizations around the asm() statement
performed during generation of PTX, you can add a “memory” clobbers
specification after a 3rd colon.
- InstructionCost: This change implements very rough string parsing
system to count the number of instructions in an inline-asm. There are
corner cases it will not handle well, but in general this is an
improvement over the current cost of the number of arguments plus one.
Debug value/declare operations imported before landing pad operations at
the bb start break invoke op verification:
```
error: first operation in unwind destination should be a llvm.landingpad operation
```
This this issue by making the placement slightly more smart.
This type can be exposed from C APIs, where instantiations of this type
are not expected to mutate after creation. To support this, mark the
lazy computation of build arguments mutable, as that is not intended to
otherwise mutate the state of these objects.
This was reviewed separately by @jansvoboda11
To reland https://github.com/llvm/llvm-project/pull/124846, we need to
make symbolizer consistent with the case when line number is 0. Always
using filename and line from debug info even if the line number is 0
sounds like the reasonable path to go.
This patch fixes:
clang-tools-extra/modularize/ModularizeUtilities.cpp:293:15: error:
no member named 'parseModuleMapFile' in 'clang::ModuleMap'; did you
mean 'loadModuleMapFile'?
Now that we have ModuleMapFile.cpp which parses module maps, it's
confusing what ModuleMap::parseModuleMapFile actually does. HeaderSearch
already called this loading a module map, so consistently use that term
in ModuleMap too.
An upcoming patch will allow just parsing a module map without loading
the modules from it.
This makes it so that `CompilerInvocation` can be the only entity that
manages ownership of `HeaderSearchOptions`, making it possible to
implement copy-on-write semantics.
When building Flang with Clang, we need to do the same quadmath.h
wrapping as we do for flang-rt. I extracted the CMake code
into FlangCommon.cmake, and cleaned up the arguments passing
to execute_process (note that `-###` was treated as `-` in the original
code, because `#` starts a comment). I believe the Clang command
does not require the input source file, so I removed it as well.
In preparation for implementing support for detection of non-protected
call instructions, refine the definition of state which is computed for
each register by data-flow analysis.
Explicitly marking the registers which are known to be trusted at
function entry is crucial for finding non-protected calls. In addition,
it fixes less-common false negatives for pac-ret, such as `ret x1` in
`f_nonx30_ret_non_auted` test case.
Fix error
llvm\clang\tools\amdgpu-arch\AMDGPUArchByHIP.cpp(102,29): error: result
of comparison of constant 18446744073709551615 with expression of type
'unsigned int' is always false
[-Werror,-Wtautological-constant-out-of-range-compare]
102 | StringRef VerStr = (Pos == StringRef::npos) ? S : S.substr(Pos +
1);
This is a cleaner design than using identifier and an optional `Selector`. It also allows rename of Objective-C method names if no declaration is at hand and thus no `Selector` instance can be formed. For example, when finding the ranges to rename based on an index that’s not clangd’s built-in index.
Previously, const and ref qualification on an operation would cause
__desugars_to to report false, which would lead to unnecessary
pessimizations. The same holds for reference_wrapper.
In practice, const and ref qualifications on the operation itself are
not relevant to determining whether an operation desugars to something
else or not, so can be ignored.
We are not stripping volatile qualifiers from operations in this patch
because we feel that this requires additional discussion.
Fixes#129312
Summary:
C++20 introduced an atomic reference type, which more easily wraps
around the standard way of dealing with atomics. Instead of a dedicated
type, it allows you to treat an existing allocation as atomic.
This has no users yet, but I'm hoping to use it when I start finalizing
my GPU allocation interface, as it will need to handle atomic values
in-place that can't be done with placement new. Hopefully this is small
enough that we can just keep it in-tree until it's needed, but I'll
accept holding it here until it has a user.
I added one extension to allow implicit conversion and CTAD.
This corrects the behaviour for getCommonSugaredType with regards to
array top level qualifiers: remove differing top level qualifiers, as
they must be redundant with element qualifiers.
Fixes https://github.com/llvm/llvm-project/issues/97005
This is consistent with what's done on GISel. This allows the register
coalescer to remove the redundant intermediate `s_mov_b32` instructions
by using `m0` directly as the result register.
Implement GNU extension intrinsic HOSTNM, both function and subroutine
forms. Add HOSTNM documentation to `flang/docs/Intrinsics.md`. Add
lowering and semantic unit tests.
(This change is modeled after GETCWD implementation.)
Since the inner wrapper call might have removed one of the entries from
the global dict that the outer wrapper ALSO was going to delete, make
sure that we check that the key is still in the global dict before
trying to act on it.
GCC, correctly, doesn't vectorize in this case. Absence of direct
instructions to convert larger fixed point to lower floating point
precision inadvertently causes rounding leading to subtle differences
across ISAs.
https://godbolt.org/z/ssEchMWrE
Co-authored by: @echristo