Cleanup lowering of atomic instructions and intrninsics. The TableGen
changes are primarily a refactor, though sub variants are now lowered
via operation legalization, potentially allowing for more DAG
optimization.
These changes are mostly pushed by the gnsyncbot directly to main and
thus don't go through a PR, but we still test on main to see if main is
broken. Given these touch llvm/, they end up burning a decent amount of
testing time for no real benefit, so I think it makes sense to exclude
them from premerge testing explicitly.
The tests for the
[intel-pt](3483740289/lldb/docs/use/intel_pt.rst)
trace plugin were failing for multiple reasons.
On machines where tracing is supported many of the tests were crashing
because of a nullptr dereference. It looks like the `core_file`
parameter in `ProcessTrace::CreateInstance` was once ignored, but was
changed to always being dereferenced. This caused the tests to fail even
when tracing was supported.
On machines where tracing is not supported we would still run tests that
attempt to take a trace. These would obviously fail because the required
hardware is not present. Note that some of the tests simply read
serialized json as trace files which does not require any special
hardware.
This PR fixes these two issues by guarding the pointer dereference and
then skipping unsupported tests on machines. With these changes the
trace tests pass on both types of machines.
We also add a new unit test to validate that a process can be created
with a nullptr core_file through the generic process trace plugin path.
This patch fixes the monolithic linux build in Ubuntu 24.04. Newer
versions of debian/ubuntu pass a warning when installing packages at the
system level using pip as it interferes with system package manager
installed python packages. We do not use any system package manager
installed python packages, so we just ignore the warning (that is an
error without passing the flag) by passing the --break-system-packages
flag.
Canonicalizes a chain of `linalg.unpack -> tensor.extract_slice` into a
`linalg.unpack` with reduced dest sizes. This will only happen when the
unpack op's only user is a non rank-reducing slice with zero offset and
unit strides.
---------
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Signed-off-by: Max Dawkins <maxdawkins19@gmail.com>
Co-authored-by: Max Dawkins <maxdawkins19@gmail.com>
If the scalar instructions is marked for the vectorization in the tree,
it cannot be vectorized as part of the another node in the same tree, in
general. It may prevent some potentially profitable vectorization
opportunities, since some nodes end up being buildvector/gather nodes,
which add to the total cost.
Patch allows revectorization of the previously vectorized scalars.
Reviewers: hiraditya, RKSimon
Reviewed By: RKSimon, hiraditya
Pull Request: https://github.com/llvm/llvm-project/pull/133091
In the split-LTO-unit mode in ThinLTO, a compilation module is split
into two and global variables that meet a specific criteria is moved to
the split module.
d21fc58aee/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp (L315-L366)
And if there is an originally local-linkage global value defined in the
original module and referenced in the split module or the vice versa,
that value is _promoted_ by attaching a module ID to their names in
order to prevent name clashes because now they can be referenced from
other modules.
d21fc58aee/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp (L46-L100)
And when that promoted global value is a function, a
`.lto_set_conditional` entry is written to the original module to avoid
breaking references from inline assembly:
d21fc58aee/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp (L84-L91)
The syntax of this is, if the original function name is `symbolA` and
the module ID is `123`,
```ll
module asm ".lto_set_conditional symbolA,symbolA.123"
```
These symbols are parsed here:
648981f913/llvm/lib/MC/MCParser/AsmParser.cpp (L6467)
The first function symbol in this `.lto_set_conditional` do not exist as
a function in the bitcode anymore because it was renamed to the second.
So they are not assigned as function symbols but they are not really
data either, so the object writer crashes here:
5b9e6c7993/llvm/lib/MC/WasmObjectWriter.cpp (L1820)
This PR makes the object writer just skip those symbols.
---
This problem was discovered when I was testing with
`-fwhole-program-vtables`. The reason we didn't have this problem before
with ThinLTO was because `-fsplit-lto-unit`, which splits LTO units when
possible, defaults to false, but it defaults to true when
`-fwhole-program-vtables` is used.
matchRegisterNameHelper returns MCRegister() for RVE so the first RVE
check was dead.
For the second check, I've moved the RVE check from the comma parsing to
the identifier parsing so the diagnostic points at the register. Note
we're using matchRegisterName instead of matchRegisterNameHelper to
avoid allowing ABI names so we don't get the RVE check that lives inside
matchRegisterNameHelper.
The errors for RVE in general should probably say something other than
"invalid register", but that's a problem throughout the assembler.
This patch fixes:
clang/tools/clang-sycl-linker/ClangSYCLLinker.cpp:127:13: error:
function 'getMainExecutable' is not needed and will not be emitted
[-Werror,-Wunneeded-internal-declaration]
This PR has the following changes:
Replace llvm-link with calls to linkInModule to link device files Add
-print-linked-module option to dump linked module for testing Added a
test to verify that linking is working as expected. We will eventually
move to using thin LTO for linking device inputs.
Thanks
---------
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Try to reduce individual subtarget features in the "target-features"
attribute. This attempts a textual removal of the fields in the string,
not a semantic removal. Typically there's a lot of redundant feature spam
in the feature list implied by the target-cpu (which I really wish clang
would stop emitting). If we could parse these out, we could easily drop
the fields without testing anything.
This commit extends the lowering of amdgpu.mfma to handle the new
double-rate MFMAs in gfx950 and adds tests for these operations.
It also adds support for MFMAs on small floats (f6 and f4), which are
implented using the "scaled" MFMA intrinsic with a scale value of 0 in
order to have an unscaled MFMA.
This commit does not add a `amdgpu.scaled_mfma` operation, as that is
future work.
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
The current API doesn't have a way to unset it. The query returns
an optional, but the set doesn't. Alternatively I could switch the
set to also use optional.
So the dSYM can be told what target it has been loaded into.
When lldb is loading modules, while creating a target, it will run
"command script import" on any Python modules in Resources/Python in the
dSYM. However, this happens WHILE the target is being created, so it is
not yet in the target list. That means that these scripts can't act on
the target that they a part of when they get loaded.
This patch adds a new python API that lldb will call:
__lldb_module_added_to_target
if it is defined in the module, passing in the Target the module was
being added to, so that code in these dSYM's don't have to guess.
Not sure this is the right place to put it. This is a property
of GlobalVariable, not GlobalValue. But the ReduceGlobalVars
reduction tries to delete the value entirely.
There are V2S copies between vpgr16 and spgr32 in true16 mode. This is
caused by vgpr16 and sgpr32 both selectable by 16bit src in ISel.
When a V2S copy and its useMI are lowered to VALU, this patch check
1. If the generated new VALU is used by a true16 inst. Add subreg access
if necessary.
2. Legalize the V2S copy by replacing it to subreg_to_reg
an example MIR looks like:
```
%2:sgpr_32 = COPY %1:vgpr_16
%3:sgpr_32 = S_OR_B32 %2:sgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3:sgpr_32, ...
```
currently lowered to
```
%2:vgpr_32 = COPY %1:vgpr_16
%3:vgpr_32 = V_OR_B32 %2:vgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3:vgpr_32, ...
```
after this patch
```
%2:vgpr_32 = SUBREG_TO_REG 0, %1:vgpr_16, lo16
%3:vgpr_32 = V_OR_B32 %2:vgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3.lo16:vgpr_32, ...
```
Not sure why the "fold-all" option naming didn't match the
variable "FoldPreOutputs", but I've preserved the difference.
More annoyingly, the pass name "normalize" does not match the pass
name IRNormalizer and should probably be fixed one way or the other.
Also the existing test coverage for the flags is lacking. I've added
a test that shows they parse, but we should have tests that they
do something.
MacOS platforms using mlir-runner in lit tests consistently hit the
following error:
```
# .---command stderr------------
# | JIT session error: Symbols not found: [ __mlir_ciface_printMemrefI32 ]
# | Error: Failed to materialize symbols: { (main, { __mlir_printMemrefI32, ... }) }
# `-----------------------------
```
https://github.com/google/heir/issues/1521#issuecomment-2751303404
confirms the issue is fixed by using `alwayslink` on these two targets,
and I confirmed on a separate Apple M1 (OSX version Sequoia 15.3.2.).
I'm not an expert on the mlir runner internals, but given the
mlir-runner is purely for testing, and alwayslink at worst adds some
overhead by not removing symbols, it seems low risk.
- Use static instead of anonymous namespace for file local functions.
- Enclose file-local classes in anonymous namespace.
- Eliminate `llvm::` qualifier when file has `using namespace llvm`.
- Eliminate namespace surrounding entire code in
SPIRVConvergenceRegionAnalysis.cpp file.
- Eliminate call to `initializeSPIRVStructurizerPass` from the pass
constructor (https://github.com/llvm/llvm-project/issues/111767)
The combination of `-fcomplex-arithmetic=promoted` and `mno-x87` for
`double` complex division is leading to a crash.
See https://godbolt.org/z/189G957oY
This patch fixes that.
It happened in https://lab.llvm.org/buildbot/#/builders/152/builds/1131
when the buildbot was switched from CTK12.3 to CTK12.8.
The logs are gone by now, so the above link is useless.
The error was:
error: ‘auto’ not permitted in template argument
This workaround helps, but I also reported the issue to NVCC devs.
We only need to keep track of the last register seen. We never need the
first register once we've parsed. Currently if s0/x8 is used RegStart
will point to that and not ra/s1 so it already isn't the start.