This PR makes verification of .debug_names acceleration table
multithreaded. In local testing it improves verification of clang
.debug_names from four minutes to under a minute.
This PR relies on a current mechanism of extracting DIEs into a vector.
Future improvements can include creating API to extract one DIE at a
time, or grouping Entires into buckets by CUs and extracting before
parallel step.
Single Thread
4:12.37 real, 246.88 user, 3.54 sys, 0 amem,10232004 mmem
Multi Thread
0:49.40 real, 612.84 user, 515.73 sys, 0 amem, 11226292 mmem
When you run lldb without colors (`-X`), the status line looks weird
because it doesn't have a background. You end up with what appears to be
floating text at the bottom of your terminal.
This patch changes the statusline to use the reverse video effect, even
when colors are off. The effect doesn't introduce any new colors and
just inverts the foreground and background color.
I considered an alternative approach which changes the behavior of the
`-X` option, so that turning off colors doesn't prevent emitting
non-color related control characters such as bold, underline, and
reverse video. I decided to go with this more targeted fix as (1) nobody
is asking for this more general change and (2) it introduces significant
complexity to plumb this through using a setting and driver flag so that
it can be disabled when running the tests.
Fixes#134112.
If ScalarPH has predecessors, we may need to update its reduction resume
values. If there is a middle block, it must be the first predecessor.
Note that the first predecessor may not be the middle block, if the
middle block doesn't branch to the scalar preheader. In that case,
fixReductionScalarResumeWhenVectorizingEpilog will be a no-op.
In preparation for https://github.com/llvm/llvm-project/pull/106748.
Block a context root from being imported by its callers.
Suppose that happened. Its caller - usually a message pump - inlines its copy of the root. Then it (the root) and whatever it calls will be the non-contextually optimized callee versions.
## Problem
When the build ids of the profile and binary do not match, the error
reported by llvm-profdata is `no entries in callstack map after
symbolization`, but the root cause of this problem is the **build id
mismatch**.
## Trigger scenario
For example, when performing `memprof` optimization on `clang`,
`rawprofile` is collected through `ninja clang`. In addition to running
clang, some other programs will also be executed, and these programs
will also generate rawprofile. When `no entries in callstack map after
symbolization` appears during `llvm-profdata merge`, users may
mistakenly think that the **instrumentation failed or other reasons**,
and will **not directly realize that the binary and profile do not
match**.
## Changed
Currently, when the build id does not match, an assert error is
triggered only in debug mode. Change it to directly return an error when
the build id does not match.
The string used for intrinsic was not the correct one
"llvm.nvvm.match.any.sync.i32p". There was an extra `p` at the end.
Use the NVVM operation instead so we don't duplicate it.
This introduces a new class 'UnsignedOrNone', which models a lite
version of `std::optional<unsigned>`, but has the same size as
'unsigned'.
This replaces most uses of `std::optional<unsigned>`, and similar
schemes utilizing 'int' and '-1' as sentinel.
Besides the smaller size advantage, this is simpler to serialize, as its
internal representation is a single unsigned int as well.
ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated,
using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO,
UADDO_CARRY, USUBO, USUBO_CARRY in the patch.
This adds support for all the surface read and write calls to clang. It
extends the pattern used for textures to surfaces too.
I tested this by generating all the various permutations of the calls
and argument types in a python script, compiling them with both clang
and nvcc, and comparing the generated ptx for equivilence. They all
agree, ignoring register allocation, and some places where Clang picks
different memory write instructions. An example kernel is:
```
__global__ void testKernel(cudaSurfaceObject_t surfObj, int x, float2* result) {
*result = surf1Dread<float2>(surfObj, x, cudaBoundaryModeZero);
}
```
---------
Signed-off-by: Austin Schuh <austin.linux@gmail.com>
This reapplies #132522.
Previously casts of scalable m_ImmConstant splats weren't being folded
by ConstantFoldCastOperand, triggering the "Constant-fold of ImmConstant
should not fail" assertion.
There are no changes to the code in this PR, instead we just needed
#133207 to land first.
A test has been added for the assertion in
llvm/test/Transforms/InstSimplify/vec-icmp-of-cast.ll
@icmp_ult_sext_scalable_splat_is_true.
<hr/>
#118806 fixed an infinite loop in FoldShiftByConstant that could occur
when the shift amount was a ConstantExpr.
However this meant that FoldShiftByConstant no longer kicked in for
scalable vectors because scalable splats are represented by
ConstantExprs.
This fixes it by allowing scalable splats of non-ConstantExprs in
m_ImmConstant, which also fixes a few other test cases where scalable
splats were being missed.
But I'm also hoping that UseConstantIntForScalableSplat will eventually
remove the need for this.
I noticed this when trying to reverse a combine on RISC-V in #132245,
and saw that the resulting vector and scalar forms were different.
If a block with a single predecessor also had its address taken,
it was getting deleted in this post-inline cleanup step. This would
result in the blockaddress in the resulting function getting deleted
and replaced with inttoptr 1.
This fixes one bug required to permit inlining of functions with blockaddress
uses.
At the moment this is not testable (at least without an annoyingly complex
unit test), and is a pre-bug fix for future patches. Functions with
blockaddress uses are rejected in isInlineViable, so we don't get this far
with the current InlineFunction uses (some of the existing cases seem to
reproduce this part of the rejection logic, like PartialInliner). This
will be tested in a pending llvm-reduce change.
Prerequisite for #38908
`tensor.insert_slice` needs to have read semantics on its destination
operand. Since it has a return value, its semantics are
- Copy dest to result
- Copy source to subview of destination.
`tensor.parallel_insert_slice` though has no result. So it does not need
to have read semantics. The op description
[here](a3ac318e5f/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td (L1524))
also says that it is expected to lower to a `memref.subview`, that does
not have read semantics on the destination (its just a view).
This patch drops the read semantics for destination of
`tensor.parallel_insert_slice` but also makes the `shared_outs` operands
of `scf.forall` have read semantics. Earlier it would rely indirectly on
read semantics of destination operand of `tensor.parallel_insert_slice`
to propagate the read semantics for `shared_outs`. Now that is specified
more directly.
Fixes#133964
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
This moves all the common settings of the launch and attach operations
into the `lldb_dap::protocol::Configuration`. These common settings
can be in both `launch` and `attach` requests and allows us to isolate
the DAP configuration operations into a single common location.
This is split out from #133624.
Reverts llvm/llvm-project#134124
The build is failing again to a linking error:
[here](https://github.com/llvm/llvm-project/pull/134124#issuecomment-2776370486).
Again the error was not present locally or any of the pre-merge builds
and must have been transitively linked in these build environments...
With AVX512VL targets, use 128/256-bit VPERMV/VPERMV3 nodes when we only need the lower elements.
Reapplied version of #133923 with fix for typo in the VPERMV3 mask adjustment
- Defines HLSLRootSignature Attr in `Attr.td`
- Define and implement handleHLSLRootSignature in `SemaHLSL`
- Adds sample test case to show AST Node is generated in
`RootSignatures-AST.hlsl`
This commit will "hook-up" the seperately defined RootSignature parser
and invoke it to create the RootElements, then store them on the
ASTContext and finally store the reference to the Elements in
RootSignatureAttr
Resolves https://github.com/llvm/llvm-project/issues/119011
---------
Co-authored-by: Finn Plummer <finnplummer@microsoft.com>
This is a follow up PR from
https://github.com/llvm/llvm-project/pull/132089.
When a V2S copy and its useMI are lowered to VALU, this patch check:
If the generated new VALU is a true16 inst. Add subreg access on all
operands if necessary.
an example MIR looks like:
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:sreg_32 = COPY %1:vgpr_32
%3:sreg_32 = S_FLOOR_F16 %2:sreg_32, ...
```
currently lowered to
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1:vgpr_32, 0, 0, 0 ...
```
after this patch
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1.lo16:vgpr_32, 0, 0, 0 ...
```
Protect the various SetBreakpoint functions with the API mutex. This
fixes a race condition between the breakpoint being created and the DAP
label getting added. This was causing `TestDAP_breakpointEvents.py` to
be flaky.
Fixes#131242.
Flang uses `fir.call <llvm intrinsic>` in a few places. This means
consumers of the IR need to strcmp every fir.call if they want to find a
particular LLVM intrinsic.
Emit LLVM memcpy intrinsics instead.
Previously only fixed vector splats were handled. This adds supports for
scalable vectors too by allowing ConstantExpr splats.
We need to add the extra V->getType()->isVectorTy() check because a
ConstantExpr might be a scalar to vector bitcast.
By allowing ConstantExprs this also allow fixed vector ConstantExprs to
be folded, which causes the diffs in
llvm/test/Analysis/ValueTracking/known-bits-from-operator-constexpr.ll
and llvm/test/Transforms/InstSimplify/ConstProp/cast-vector.ll. I can
remove them from this PR if reviewers would prefer.
Fixes#132922
This PR is to improve the driver code to build `flang-rt` path by
re-using the logic and code of `compiler-rt`.
1. Moved `addFortranRuntimeLibraryPath` and `addFortranRuntimeLibs` to
`ToolChain.h` and made them virtual so that they can be overridden if
customization is needed. The current implementation of those two
procedures is moved to `ToolChain.cpp` as the base implementation to
default to.
2. Both AIX and PPCLinux now override `addFortranRuntimeLibs`.
The overriding function of `addFortranRuntimeLibs` for both AIX and
PPCLinux calls `getCompilerRTArgString` => `getCompilerRT` =>
`buildCompilerRTBasename` to get the path to `flang-rt`. This code
handles `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR` setting. As shown in
`PPCLinux.cpp`, `FT_static` is the default. If not found, it will search
and build for `FT_shared`. To differentiate `flang-rt` from `clang-rt`,
a boolean flag `IsFortran` is passed to the chain of functions in order
to reach `buildCompilerRTBasename`.
This PR updates AMDGPULowerBufferFatPointers to use the
InstSimplifyFolder
when creating IR during buffer fat pointer lowering.
This shouldn't cause any large functional changes and might improve the
quality of the generated code.
Following commit b8fc288, which changed some dexter test substitutions to
be specific to C and C++, some tests that had been added since the original
patch was written were still using the old substitution; this patch updates
them to use the new.
This NFC patch simplifies the main loop in HandleProcessStateChanged
event by moving duplicated code into the StopInfo class, also allowing
StopInfo subclasses to override behavior.
More specifically, two functions are created:
* ShouldShow: should a Thread with such StopInfo should be printed when
the debugger stops? Currently, no StopInfo subclasses override this, but
a subsequent patch will fix a bug by making StopInfoBreakpoint check
whether the breakpoint is internal.
* ShouldSelect: should a Thread with such a StopInfo be selected? This
is currently overridden by StopInfoUnixSignal but will, in the future,
be overridden by StopInfoBreakpoint.
This patch updates the handling of target regions to set trip counts and
kernel execution modes properly, based on clang's behavior. This fixes a
race condition on `target teams distribute` constructs with no `parallel
do` loop inside.
This is how kernels are classified, after changes introduced in this
patch:
```f90
! Exec mode: SPMD.
! Trip count: Set.
!$omp target teams distribute parallel do
do i=...
end do
! Exec mode: Generic-SPMD.
! Trip count: Set (outer loop).
!$omp target teams distribute
do i=...
!$omp parallel do private(idx, y)
do j=...
end do
end do
! Exec mode: Generic-SPMD.
! Trip count: Set (outer loop).
!$omp target teams distribute
do i=...
!$omp parallel
...
!$omp end parallel
end do
! Exec mode: Generic.
! Trip count: Set.
!$omp target teams distribute
do i=...
end do
! Exec mode: SPMD.
! Trip count: Not set.
!$omp target parallel do
do i=...
end do
! Exec mode: Generic.
! Trip count: Not set.
!$omp target
...
!$omp end target
```
For the split `target teams distribute + parallel do` case, clang
produces a Generic kernel which gets promoted to Generic-SPMD by the
openmp-opt pass. We can't currently replicate that behavior in flang
because our codegen for these constructs results in the introduction of
calls to the `kmpc_distribute_static_loop` family of functions, instead
of `kmpc_distribute_static_init`, which currently prevent promotion of
the kernel to Generic-SPMD.
For the time being, instead of relying on the openmp-opt pass, we look
at the MLIR representation to find the Generic-SPMD pattern and directly
tag the kernel as such during codegen. This is what we were already
doing, but incorrectly matching other kinds of kernels as such in the
process.
This patch replaces invocations of clang with clang++ for a set of
c++ files in the dexter cross-project tests. As a small additional change,
this patch removes -lstdc++ from a test that did not appear to require it.
Just to get some more coverage.
Some of the behavior might be weird and change in the future, but let's
lock down what happens today to at least prevent regressions.
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
…ncorrect name
Clang needs variables to be represented with unique names. This means
that if a variable shadows another, its given a different name
internally to ensure it has a unique name. If ASan tries to use this
name when printing an error, it will print the modified unique name,
rather than the variable's source code name
Fixes#47326