519947 Commits

Author SHA1 Message Date
Alexandros Lamprineas
88c2af80fa
[NFC][clang][FMV][TargetInfo] Refactor API for FMV feature priority. (#116257)
Currently we have code with target hooks in CodeGenModule shared between
X86 and AArch64 for sorting MultiVersionResolverOptions. Those are used
when generating IFunc resolvers for FMV. The RISCV target has different
criteria for sorting, therefore it repeats sorting after calling
CodeGenFunction::EmitMultiVersionResolver.

I am moving the FMV priority logic in TargetInfo, so that it can be
implemented by the TargetParser which then makes it possible to query it
from llvm. Here is an example why this is handy:
https://github.com/llvm/llvm-project/pull/87939
2024-11-28 09:22:05 +00:00
Haojian Wu
2c242b98c6
[clang] Add a lifetime_capture_by testcase for temporary capturing object. (#117733)
Add a test case to indicate this is an expected behavior.
2024-11-28 10:17:41 +01:00
Florian Hahn
f8f238d38e
[AArch64] Add extra add/cast tests for select-optimize.
Extra tests for https://github.com/llvm/llvm-project/pull/115489
with different operand order. Also fixes the target triple.
2024-11-28 09:13:29 +00:00
Nikolas Klauser
0604d13790
[Clang] Add [[clang::no_specializations]] (#101469)
This can be used to inform users when a template should not be
specialized. For example, this is the case for the standard type traits
(except for `common_type` and `common_reference`, which have more
complicated rules).
2024-11-28 10:13:18 +01:00
NAKAMURA Takumi
71648a4ef9 Make MCDCRecord::getNumConditions() const&
Some users were trying to get a reference to the return value.
2024-11-28 18:09:27 +09:00
Jay Foad
89b08c8ee7
[TableGen] Simplify generated code for isSubclass (#117351)
Implement isSubclass with direct lookup into some tables instead of
nested switches.

Part of the motivation for this is improving compile time when clang-18
is used as a host compiler, since it seems to have trouble with very
large switch statements.
2024-11-28 08:52:02 +00:00
CHANDRA GHALE
76e6c8d3fc
Codegen changes for strict modifier with grainsize/num_tasks of taskloop construct (#117196)
Initial parsing/sema for 'strict' modifier with 'num_tasks' and
‘grainsize’ clause is present in these commits
[grainsize_parsing](ab9eac762c)
and
[num_tasks_parsing](56c1660170 (diff-4184486638e85284c3a2c961a81e7752231022daf97e411007c13a6732b50db9R6545))
. However, this implementation appears incomplete as it lacks code
generation support. A runtime patch was introduced in this runtime
commit
[runtime_patch](540007b427 (diff-5e95f9319910d6965d09c301359dbe6b23f3eef5ce4d262ef2c2d2137875b5c4R374))
, which adds a new API, _kmpc_taskloop_5, to accommodate the strict
modifier. 
In this patch I have added codegen support. When the strict modifier is
present alongside the grainsize or num_tasks clauses of taskloop
construct, the code now emits a call to _kmpc_taskloop_5, which includes
an additional parameter of type i32 with the value 1 to indicate the
strict modifier. If the strict modifier is not present, it falls back to
the existing _kmpc_taskloop API call.

---------

Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
2024-11-28 14:18:59 +05:30
Markus Böck
3327195610
[mlir][LLVM][NFC] Implement print/parse for LLVMStructType (#117930)
The printing and parsing logic for struct types was still using ad-hoc
functions instead of the more conventional `print` and `parse` methods
whose declarations are automatically generated by TableGen.

This PR effectively renames these functions and uses them directly as
implementations for `print` and `parse` of `LLVMStructType`.

This additionally fixes linking errors when users or auto generated code
may call `print` and `parse` directly.

Fixes https://github.com/llvm/llvm-project/issues/117927
2024-11-28 09:19:31 +01:00
Durgadoss R
7173a7d7f9
[NVPTX][NFC] Use NAME macro for TMA intrinsic defs (#117907)
This patch updates the TMA intrinsic definitions to use the "NAME"
macro (inside the multiclass) instead of an empty string.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-11-28 13:45:55 +05:30
Zhaoxuan Jiang
60db321081
[AArch64] Do not mark homogeneous prolog/epilog functions optnone (#117959)
The verifier complains that synthesized IR functions have minsize and
optnone attributes which are incompatible. This patch removes optnone
attribute and updates affected tests as needed.
2024-11-28 00:11:05 -08:00
Carlos Alberto Enciso
3ffee0086c
[llvm-debuginfo-analyzer] Fix compile/link errors on specific builders. (#117971)
Link errors on builders:
- llvm-nvptx-nvidia-ubuntu
- llvm-nvptx64-nvidia-ubuntu

Add explicitly references to DebugInfoDWARF and Object.

Compile errors on builders:
- ppc64le-lld-multistage-test
- clang-ppc64le-linux-multistage
- clang-ppc64le-rhel

error: comparison of integers of different signs:

Add to the constants used in the 'EXPECT_EQ' the 'u' postfix.
2024-11-28 08:08:28 +00:00
Haohai Wen
69d66fafec
[clang] Fix description for fprofile-sample-use= on Windows (#117973)
We only support -fprofile-sample-use= for clang-cl.
2024-11-28 15:43:21 +08:00
Haohai Wen
f6694534ac
[Driver] Remove non MSVC CL flags /fprofile-sample-use (#117970)
Those flags are introduced in #117282. They are not supported by MSVC.
2024-11-28 15:36:06 +08:00
Pavel Labath
c1dff71525
[lldb] Remove child_process_inherit from the socket classes (#117699)
It's never set to true. Also, using inheritable FDs in a multithreaded
process pretty much guarantees descriptor leaks. It's better to
explicitly pass a specific FD to a specific subprocess, which we already
mostly can do using the ProcessLaunchInfo FileActions.
2024-11-28 08:27:36 +01:00
Pengcheng Wang
93f7398bdb
[RISCV] Add TuneDisableLatencySchedHeuristic
This tune feature will disable latency scheduling heuristic.

This can reduce the number of spills/reloads but will cause some
regressions on some cores.

CPU may add this tune feature if they find it's profitable.

Reviewers: lukel97, michaelmaitland, asb, preames, mshockwave, topperc

Reviewed By: michaelmaitland, mshockwave, topperc

Pull Request: https://github.com/llvm/llvm-project/pull/115858
2024-11-28 15:16:23 +08:00
Sudharsan Veeravalli
c4645ffeda
[RISCV] Add Qualcomm uC Xqcicsr (CSR) extension (#117169)
The Qualcomm uC Xqcicsr extension adds 2 instructions that can read and
write CSRs.

The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/latest

This patch adds assembler only support.
2024-11-28 12:46:15 +05:30
Elvis Wang
9ea5be639d
Recommit "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109)" (#117289)
Update the test cases contains `any-of` printings from the
precomputeCost().

Origin message: 

The any-of reduction contains phi and select instructions.

The select instruction might be optimized and removed in the vplan which
may cause VF difference between legacy and VPlan-based model. But if the
select instruction be removed, planContainsAdditionalSimplifications()
will catch it and disable the assertion.

Therefore, we can just remove the ayn-of reduction calculation in the
precomputeCost().



Recommit "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC
(#117109)"
2024-11-28 15:07:36 +08:00
Pengcheng Wang
d36a4c0715
[RISCV] Rename some Feature* to Tune* (#117966)
These features should be tune features.
2024-11-28 15:01:49 +08:00
s-watanabe314
f3cf24fcc4
[flang] Apply nocapture attribute to dummy arguments (#116182)
Apply llvm.nocapture attribute to dummy arguments that do not have the
target, asynchronous, volatile, or pointer attributes in a procedure
that is not a bind(c). This was discussed in


https://discourse.llvm.org/t/applying-the-nocapture-attribute-to-reference-passed-arguments-in-fortran-subroutines/81401
2024-11-28 15:39:26 +09:00
Durgadoss R
1c76958465
[NVPTX] Add unreachable for TMA Inst Printer (#117850)
This patch adds the llvm_reachable() for TMA
reduction opcode printer method, outside the
switch.

We had this inside the default-case leading to
the warning below (and hence was removed):
error: default label in switch which covers all enumeration values
         [-Werror,-Wcovered-switch-default]

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-11-28 10:55:18 +05:30
Carlos Alberto Enciso
fb3765959f
[llvm-debuginfo-analyzer] Common handling of unsigned attribute values. (#116027)
- In the DWARF reader, for those attributes that can have an unsigned
value, allow for the following cases:
  * Is an implicit constant
  * Is an optional value
- The testing is done by creating a file with generated DWARF, using
`DwarfGenerator` (generate DWARF debug info for unit tests).
2024-11-28 05:21:47 +00:00
Lang Hames
f710b04233 [ORC] Fail early in ExecutionSession::registerJITDispatchHandlers.
Check that we're not reusing any handler tag addresses before installing any
handlers. This ensures that either all of the handlers are installed*, or none
of them are, simplifying error recovery.

* Ignoring handlers whose tags couldn't be resolved at all: these were never
installed.
2024-11-28 15:29:16 +11:00
Kareem Ergawy
2918a47f42
[mlir][OpenMP] Annotate private vars with map_idx when needed (#116770)
This PR extends the MLIR representation for `omp.target` ops by adding a
`map_idx` to `private` vars. This annotation stores the index of the map
info operand corresponding to the private var. If the variable does not
have a map operand, the `map_idx` attribute is either not present at all
or its value is `-1`.

This makes matching the private variable to its map info op easier (see
https://github.com/llvm/llvm-project/pull/116576 for usage).
2024-11-28 05:15:33 +01:00
Kareem Ergawy
81f544d465
[flang][OpenMP] Rewrite omp.loop to semantically equivalent ops (#115443)
Introduces a new conversion pass that rewrites `omp.loop` ops to their
semantically equivalent op nests bases on the surrounding/binding
context of the `loop` op. Not all forms of `omp.loop` are supported yet.
See `isLoopConversionSupported` for more info on which forms are
supported.
2024-11-28 05:15:06 +01:00
Matthias Springer
3a115279f8
[mlir][Transforms][NFC] Dialect conversion: Improve docs for materializations (#117847)
The terms "legal type" and "illegal type" are ambiguous when talking
about materializations. E.g., for target materializations we do not
necessarily convert from illegal to legal types. We convert from the
most recently mapped value to the type that was produced by converting
the original type.

---------

Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
2024-11-28 12:30:54 +09:00
Jie Fu
c8b15157d7 [mlir-opt] Fix -Wcovered-switch-default in MlirOptMain.cpp (NFC)
/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:262:7:
error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default]
      default:
      ^
1 error generated.
2024-11-28 11:22:28 +08:00
Mehdi Amini
db273c6c24
[MLIR][ODS] Add support for wrapping enums with std::optional in Type/Attr definitions (#117719) 2024-11-28 03:59:42 +01:00
sfzhu93
1f422dc399
[MLIR][mlir-opt] add support for disabling diagnostics (#117669)
This PR adds a command line argument `--mlir-disable-diagnostic` for
disabling diagnostic information for mlir-opt.
When debugging with mlir-opt, some developers would like to disable the
diagnostic information and focus specifically on the dumped IR. For
example, https://github.com/triton-lang/triton/pull/5250
2024-11-27 18:51:18 -08:00
Schrodinger ZHU Yifan
700d9ac9ef
[libc] disable process_mrelease for riscv (#117956)
`process_mrelease` upsets the RV32 build bot. Disable it for now.
2024-11-27 21:17:38 -05:00
Joseph Huber
054f914741
[Runtimes] Merge 'compile_commands.json' files from runtimes build (#116303)
Summary:
When building a project in a runtime mode, the compilation database is a
separate CMake invocation. So its `compile_commands.json` file will be
placed elsewhere in the `runtimes/runtime-bins` directory. This is
somewhat annoying for ongoing development when a runtimes build is
necessary. This patch adds some CMake magic to merge the two files.
2024-11-27 20:14:26 -06:00
Joseph Huber
a24aa7dfa5
[Offload] Use libc 'hand-in-hand' module to find RPC header (#117928)
Summary:
We should now use the official™ way to include the files from
`libc/shared`. This required some code to make sure that it's not
included twice if multiple people use it as well as a sanity check on
the directory.
2024-11-27 20:14:13 -06:00
LiqinWeng
4a3f46de50
[LV][EVL] Support call instruction with EVL-vectorization (#110412) 2024-11-28 10:05:08 +08:00
Schrodinger ZHU Yifan
819b155c2a
[libc] skip test and return ENOSYS when processm_release unavailable (#117951) 2024-11-27 20:52:16 -05:00
Haohai Wen
c8cd497c98
[Driver] Support fprofile-sample-use= for CL (#117282)
Sampling PGO has already been supported on Windows. This patch adds
/fprofile-sample-use= /fprofile-sample-use: /fno-profile-sample-use and
supports -fprofile-sample-use= for CL.
2024-11-28 09:33:24 +08:00
A. Jiang
63c5a422f0
[Clang] Fix constexpr-ness on implicitly deleted destructors (#116359)
In C++20, a defaulted but implicitly deleted destructor is constexpr if
and only if the class has no virtual base class. This hasn't been
changed in C++23 by P2448R2.

Constexpr-ness on a deleted destructor affects almost nothing. The
`__is_literal` intrinsic is related, while the corresponding
`std::is_literal_type(_v)` utility has been removed in C++20. A recently
added example in `test/AST/ByteCode/cxx23.cpp` will become valid, and
the example is already accepted by GCC.

Clang currently behaves correctly in C++23 mode, because the
constexpr-ness on defaulted destructor is relaxed by P2448R2. But we
should make similar relaxation for an implicitly deleted destructor.

Fixes #85550.
2024-11-28 09:19:02 +08:00
Omar Hossam
d2b482b0ef
[libc] (reland #117503) Implement process_mrelease (#117851)
This PR implements process_mrelease.
A previous PR was merged #117503, but failed on merge due to an issue in
the tests. Namely the failing tests were comparing against return type
as opposed to errno. This is fixed in this PR.
2024-11-27 20:15:17 -05:00
Stella Laurenzo
65339e4d74
[mlir] Add option to disable MLIR Python dev package configuration. (#117934)
Adds a CMake option MLIR_DISABLE_CONFIGURE_PYTHON_DEV_PACKAGES which
gates doing package discovery and configuration for Python dev packages
by MLIR (this was made opt-out to preserve compatibility with
find_package(MLIR) based uses which do not set the standard options).

The default Python setup that MLIR does has been a problem for
super-projects that include LLVM for a long time because it forces a
very specific package discovery mechanism that is not uniform in all
uses.

When reviewing #117922, I noted that this would effectively be a break
the world event for downstreams, forcing them to adapt their nanobind
dep to the exact way that MLIR does it. Adding the option to just
wholesale skip the built-in configuration heuristics at least gives us a
mechanism to tell downstreams to migrate to, giving them complete
control and not requiring packaging workarounds. This seemed a better
option than (once again) creating a situation where downstreams could
not integrate the dep change without doing tricky infra upgrades, and it
removes the burden from the author of that patch from needing to think
about how this affects super-projects that include MLIR (i.e. they can
just be told to do it themselves as needed vs being in a wedged state
and unable to upgrade).
2024-11-27 17:11:32 -08:00
abhishek-kaushik22
9bdf683ba6
[X86] Enforce strict pre-legalization to combine in scalarizeExtEltFP (#117681)
Use a `DCI` object to actually check the DAG combine level instead of
using the type `i1` because this assumption fails on AVX512 where we
have types like `v8i1` after legalization.

Closes #117684
2024-11-28 08:19:10 +08:00
Yusuke MINATO
e573c6b67e
[flang] Add nsw to DO loop parameters (#113854)
nsw is added to DO loop parameters (initial parameters, terminal
parameters, and incrementation parameters).
This can help vectorization in some cases like #110609.

See also the discussion in
https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584/20.
2024-11-28 08:58:09 +09:00
Maurice Heumann
21af99ab84
[WinEH] Emit state stores for SEH scopes (#116546)
At the moment Windows 32 bit SEH state stores are only emitted for
throwing calls.

Windows 32 bit SEH state stores should also be emitted before SEH scope
begin and before SEH scope end.
An invalid inline memory access would otherwise not trigger unwinding,
in combination with /EHa.

This fixes #90946
2024-11-27 15:43:20 -08:00
Pranav Kant
8df63211a6
[BitstreamReader] Fix 32-bit overflow (#117363)
This got exposed when processing large LTO-generated files leading to
crashes.
2024-11-27 14:53:34 -08:00
Craig Topper
80afdbe6a5 [RISCV] Use RISCVSubtarget::is64Bit() instead of hasFeature(RISCV::Feature64Bit). NFC 2024-11-27 14:02:15 -08:00
Joseph Huber
4cb4516ae9 [OpenMP] Fix RPC client not being optimized out after changes
Summary:
I forgot that this check deliberately looked through the indirection I
removed. Fix it to just check if the symbol has no users.
2024-11-27 15:56:23 -06:00
Philip Reames
c6f2d35c4d Fix a build warning introduce by my febbf910 2024-11-27 13:41:29 -08:00
Felipe Magno de Almeida
e3fdc3aa81
[RISCV] Allow hoisting VXRM writes out of loops speculatively (#110044)
Change the intersect for the anticipated algorithm to ignore unknown
when anticipating. This effectively allows VXRM writes speculatively
because it could do a VXRM write even when there's branches where VXRM
is unneeded.

The importance of this change is because VXRM writes causes pipeline
flushes in some micro-architectures and so it makes sense to allow more
aggressive hoisting even if it causes some degradation for the slow
path.

An example is this code:
```
typedef unsigned char uint8_t;
__attribute__ ((noipa))
void foo (uint8_t *dst,  int i_dst_stride,
           uint8_t *src1, int i_src1_stride,
           uint8_t *src2, int i_src2_stride,
           int i_width, int i_height )
{
   for( int y = 0; y < i_height; y++ )
     {
       for( int x = 0; x < i_width; x++ )
         dst[x] = ( src1[x] + src2[x] + 1 ) >> 1;
       dst  += i_dst_stride;
       src1 += i_src1_stride;
       src2 += i_src2_stride;
     }
}
```
With this patch, the code above generates a hoisting VXRM writes out of
the outer loop.
2024-11-27 13:31:39 -08:00
Philip Reames
febbf9105f
[RISCV] Match vcompress during shuffle lowering (#117748)
This change matches a subset of vcompress patterns during shuffle
lowering. The subset implemented requires a contiguous prefix of
demanded elements followed by undefs. This subset was chosen for two
reasons: 1) which elements to spurious demand is a non-obvious problem,
and 2) my first several attempts at implementing the general case were
buggy. I decided to go with the simple case to start with.

vcompress scales better with LMUL than a general vrgather, and at least
the SpaceMit X60, has higher throughput even at m1. It also has the
advantage of requiring smaller vector constants at one bit per element
as opposed to vrgather which is a minimum of 8 bits per element. The
downside to using vcompress is that we can't fold a vselect into it, as
there is no masked vcompress variant.

For reference, here are the relevant throughputs from camel-cdr's data
table on BP3 (X60):
  vrgather.vv v8,v16,v24    4.0  16.0  64.0  256.0
  vcompress.vm v8,v16,v24   3.0  10.0  36.0  136.
  vmerge.vvm v8,v16,v24,v0  2.0  4.0   8.0   16.0

The largest concern with the extra vmerge is that we locally increase
register pressure. If we do have masking, we also have a passthru,
without the ability to fold that into the vcompress, we need to keep it
alive a bit longer. This can hurt at e.g. m8 where we have very few
architectural registers. As compared with the vrgather.vv sequence, this
is only one additional m1 VREG - since we no longer need the index
vector. It compares slightly worse against vrgatherie16.vv which can use
index vectors smaller than other operands. Note that we could
potentially fold the vmerge if only tail elements are being preserved; I
haven't investigated this.

It is unfortunately hard given our current lowering structure to know if
we're emitting a shuffle where masking will follow. Thankfully, it
doesn't seem to show up much in practice, so I think we can probably
ignore it.

This patch only handles single source compress idioms at the moment.
This is an effort to avoid interacting with other patches on review for
changing how we canonicalize length changing shuffles.
2024-11-27 13:23:18 -08:00
lialan
1669ac434c
[MLIR] Refactor mask compression logic when emulating vector.maskedload ops (#116520)
This patch simplifies and extends the logic used when compressing masks
emitted by `vector.constant_mask` to support extracting 1-D vectors from
multi-dimensional vector loads. It streamlines mask computation, making
it applicable for multi-dimensional mask generation, improving the
overall handling of masked load operations.
2024-11-27 13:22:13 -08:00
Joseph Huber
1d810ece2b
[libc] Move libc server handlers to a shared header (#117908)
Summary:
We can simply include this header from the shared directory now and do
not need to have this level of indirection. Simply stash it with the
other libc opcode handlers.

If we were able to move the printf handlers to the shared directory then
this could just be a header as well, which would HEAVILY simplify the
mess associated with building the RPC server first in the projects
build, then copying it to the runtimes build.
2024-11-27 14:57:52 -06:00
Joseph Huber
89d8e70031
[libc] Export a pointer to the RPC client directly (#117913)
Summary:
We currently have an unnecessary level of indirection when initializing
the RPC client. This is a holdover from when the RPC client was not
trivially copyable and simply makes it more complicated. Here we use the
`asm` syntax to give the C++ variable a valid name so that we can just
copy to it directly.

Another advantage to this, is that if users want to piggy-back on the
same RPC interface they need only declare theirs as extern with the same
symbol name, or make it weak to optionally use it if LIBC isn't
avaialb.e
2024-11-27 14:57:38 -06:00
Craig Topper
175051b05e [RISCV][GISel] Support libcalls for f32/f64 acos/asin/atan/atan2/cosh/sinh/tanh. 2024-11-27 12:23:12 -08:00