llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-23 22:16:05 +00:00

Author	SHA1	Message	Date
Alexandros Lamprineas	88c2af80fa	[NFC][clang][FMV][TargetInfo] Refactor API for FMV feature priority. (#116257 ) Currently we have code with target hooks in CodeGenModule shared between X86 and AArch64 for sorting MultiVersionResolverOptions. Those are used when generating IFunc resolvers for FMV. The RISCV target has different criteria for sorting, therefore it repeats sorting after calling CodeGenFunction::EmitMultiVersionResolver. I am moving the FMV priority logic in TargetInfo, so that it can be implemented by the TargetParser which then makes it possible to query it from llvm. Here is an example why this is handy: https://github.com/llvm/llvm-project/pull/87939	2024-11-28 09:22:05 +00:00
Haojian Wu	2c242b98c6	[clang] Add a lifetime_capture_by testcase for temporary capturing object. (#117733 ) Add a test case to indicate this is an expected behavior.	2024-11-28 10:17:41 +01:00
Florian Hahn	f8f238d38e	[AArch64] Add extra add/cast tests for select-optimize. Extra tests for https://github.com/llvm/llvm-project/pull/115489 with different operand order. Also fixes the target triple.	2024-11-28 09:13:29 +00:00
Nikolas Klauser	0604d13790	[Clang] Add [[clang::no_specializations]] (#101469 ) This can be used to inform users when a template should not be specialized. For example, this is the case for the standard type traits (except for `common_type` and `common_reference`, which have more complicated rules).	2024-11-28 10:13:18 +01:00
NAKAMURA Takumi	71648a4ef9	Make MCDCRecord::getNumConditions() `const&` Some users were trying to get a reference to the return value.	2024-11-28 18:09:27 +09:00
Jay Foad	89b08c8ee7	[TableGen] Simplify generated code for isSubclass (#117351 ) Implement isSubclass with direct lookup into some tables instead of nested switches. Part of the motivation for this is improving compile time when clang-18 is used as a host compiler, since it seems to have trouble with very large switch statements.	2024-11-28 08:52:02 +00:00
CHANDRA GHALE	76e6c8d3fc	Codegen changes for strict modifier with grainsize/num_tasks of taskloop construct (#117196 ) Initial parsing/sema for 'strict' modifier with 'num_tasks' and ‘grainsize’ clause is present in these commits [grainsize_parsing](`ab9eac762c`) and [num_tasks_parsing](`56c1660170 (diff-4184486638e85284c3a2c961a81e7752231022daf97e411007c13a6732b50db9R6545)`) . However, this implementation appears incomplete as it lacks code generation support. A runtime patch was introduced in this runtime commit [runtime_patch](`540007b427 (diff-5e95f9319910d6965d09c301359dbe6b23f3eef5ce4d262ef2c2d2137875b5c4R374)`) , which adds a new API, _kmpc_taskloop_5, to accommodate the strict modifier. In this patch I have added codegen support. When the strict modifier is present alongside the grainsize or num_tasks clauses of taskloop construct, the code now emits a call to _kmpc_taskloop_5, which includes an additional parameter of type i32 with the value 1 to indicate the strict modifier. If the strict modifier is not present, it falls back to the existing _kmpc_taskloop API call. --------- Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>	2024-11-28 14:18:59 +05:30
Markus Böck	3327195610	[mlir][LLVM][NFC] Implement `print/parse` for `LLVMStructType` (#117930 ) The printing and parsing logic for struct types was still using ad-hoc functions instead of the more conventional `print` and `parse` methods whose declarations are automatically generated by TableGen. This PR effectively renames these functions and uses them directly as implementations for `print` and `parse` of `LLVMStructType`. This additionally fixes linking errors when users or auto generated code may call `print` and `parse` directly. Fixes https://github.com/llvm/llvm-project/issues/117927	2024-11-28 09:19:31 +01:00
Durgadoss R	7173a7d7f9	[NVPTX][NFC] Use NAME macro for TMA intrinsic defs (#117907 ) This patch updates the TMA intrinsic definitions to use the "NAME" macro (inside the multiclass) instead of an empty string. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-11-28 13:45:55 +05:30
Zhaoxuan Jiang	60db321081	[AArch64] Do not mark homogeneous prolog/epilog functions optnone (#117959 ) The verifier complains that synthesized IR functions have minsize and optnone attributes which are incompatible. This patch removes optnone attribute and updates affected tests as needed.	2024-11-28 00:11:05 -08:00
Carlos Alberto Enciso	3ffee0086c	[llvm-debuginfo-analyzer] Fix compile/link errors on specific builders. (#117971 ) Link errors on builders: - llvm-nvptx-nvidia-ubuntu - llvm-nvptx64-nvidia-ubuntu Add explicitly references to DebugInfoDWARF and Object. Compile errors on builders: - ppc64le-lld-multistage-test - clang-ppc64le-linux-multistage - clang-ppc64le-rhel error: comparison of integers of different signs: Add to the constants used in the 'EXPECT_EQ' the 'u' postfix.	2024-11-28 08:08:28 +00:00
Haohai Wen	69d66fafec	[clang] Fix description for fprofile-sample-use= on Windows (#117973 ) We only support -fprofile-sample-use= for clang-cl.	2024-11-28 15:43:21 +08:00
Haohai Wen	f6694534ac	[Driver] Remove non MSVC CL flags /fprofile-sample-use (#117970 ) Those flags are introduced in #117282. They are not supported by MSVC.	2024-11-28 15:36:06 +08:00
Pavel Labath	c1dff71525	[lldb] Remove child_process_inherit from the socket classes (#117699 ) It's never set to true. Also, using inheritable FDs in a multithreaded process pretty much guarantees descriptor leaks. It's better to explicitly pass a specific FD to a specific subprocess, which we already mostly can do using the ProcessLaunchInfo FileActions.	2024-11-28 08:27:36 +01:00
Pengcheng Wang	93f7398bdb	[RISCV] Add TuneDisableLatencySchedHeuristic This tune feature will disable latency scheduling heuristic. This can reduce the number of spills/reloads but will cause some regressions on some cores. CPU may add this tune feature if they find it's profitable. Reviewers: lukel97, michaelmaitland, asb, preames, mshockwave, topperc Reviewed By: michaelmaitland, mshockwave, topperc Pull Request: https://github.com/llvm/llvm-project/pull/115858	2024-11-28 15:16:23 +08:00
Sudharsan Veeravalli	c4645ffeda	[RISCV] Add Qualcomm uC Xqcicsr (CSR) extension (#117169 ) The Qualcomm uC Xqcicsr extension adds 2 instructions that can read and write CSRs. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support.	2024-11-28 12:46:15 +05:30
Elvis Wang	9ea5be639d	Recommit "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109 )" (#117289 ) Update the test cases contains `any-of` printings from the precomputeCost(). Origin message: The any-of reduction contains phi and select instructions. The select instruction might be optimized and removed in the vplan which may cause VF difference between legacy and VPlan-based model. But if the select instruction be removed, planContainsAdditionalSimplifications() will catch it and disable the assertion. Therefore, we can just remove the ayn-of reduction calculation in the precomputeCost(). Recommit "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109)"	2024-11-28 15:07:36 +08:00
Pengcheng Wang	d36a4c0715	[RISCV] Rename some Feature* to Tune* (#117966 ) These features should be tune features.	2024-11-28 15:01:49 +08:00
s-watanabe314	f3cf24fcc4	[flang] Apply nocapture attribute to dummy arguments (#116182 ) Apply llvm.nocapture attribute to dummy arguments that do not have the target, asynchronous, volatile, or pointer attributes in a procedure that is not a bind(c). This was discussed in https://discourse.llvm.org/t/applying-the-nocapture-attribute-to-reference-passed-arguments-in-fortran-subroutines/81401	2024-11-28 15:39:26 +09:00
Durgadoss R	1c76958465	[NVPTX] Add unreachable for TMA Inst Printer (#117850 ) This patch adds the llvm_reachable() for TMA reduction opcode printer method, outside the switch. We had this inside the default-case leading to the warning below (and hence was removed): error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default] Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-11-28 10:55:18 +05:30
Carlos Alberto Enciso	fb3765959f	[llvm-debuginfo-analyzer] Common handling of unsigned attribute values. (#116027 ) - In the DWARF reader, for those attributes that can have an unsigned value, allow for the following cases: * Is an implicit constant * Is an optional value - The testing is done by creating a file with generated DWARF, using `DwarfGenerator` (generate DWARF debug info for unit tests).	2024-11-28 05:21:47 +00:00
Lang Hames	f710b04233	[ORC] Fail early in ExecutionSession::registerJITDispatchHandlers. Check that we're not reusing any handler tag addresses before installing any handlers. This ensures that either all of the handlers are installed, or none of them are, simplifying error recovery. Ignoring handlers whose tags couldn't be resolved at all: these were never installed.	2024-11-28 15:29:16 +11:00
Kareem Ergawy	2918a47f42	[mlir][OpenMP] Annotate `private` vars with `map_idx` when needed (#116770 ) This PR extends the MLIR representation for `omp.target` ops by adding a `map_idx` to `private` vars. This annotation stores the index of the map info operand corresponding to the private var. If the variable does not have a map operand, the `map_idx` attribute is either not present at all or its value is `-1`. This makes matching the private variable to its map info op easier (see https://github.com/llvm/llvm-project/pull/116576 for usage).	2024-11-28 05:15:33 +01:00
Kareem Ergawy	81f544d465	[flang][OpenMP] Rewrite `omp.loop` to semantically equivalent ops (#115443 ) Introduces a new conversion pass that rewrites `omp.loop` ops to their semantically equivalent op nests bases on the surrounding/binding context of the `loop` op. Not all forms of `omp.loop` are supported yet. See `isLoopConversionSupported` for more info on which forms are supported.	2024-11-28 05:15:06 +01:00
Matthias Springer	3a115279f8	[mlir][Transforms][NFC] Dialect conversion: Improve docs for materializations (#117847 ) The terms "legal type" and "illegal type" are ambiguous when talking about materializations. E.g., for target materializations we do not necessarily convert from illegal to legal types. We convert from the most recently mapped value to the type that was produced by converting the original type. --------- Co-authored-by: Markus Böck <markus.boeck02@gmail.com>	2024-11-28 12:30:54 +09:00
Jie Fu	c8b15157d7	[mlir-opt] Fix -Wcovered-switch-default in MlirOptMain.cpp (NFC) /llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:262:7: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default] default: ^ 1 error generated.	2024-11-28 11:22:28 +08:00
Mehdi Amini	db273c6c24	[MLIR][ODS] Add support for wrapping enums with std::optional in Type/Attr definitions (#117719 )	2024-11-28 03:59:42 +01:00
sfzhu93	1f422dc399	[MLIR][mlir-opt] add support for disabling diagnostics (#117669 ) This PR adds a command line argument `--mlir-disable-diagnostic` for disabling diagnostic information for mlir-opt. When debugging with mlir-opt, some developers would like to disable the diagnostic information and focus specifically on the dumped IR. For example, https://github.com/triton-lang/triton/pull/5250	2024-11-27 18:51:18 -08:00
Schrodinger ZHU Yifan	700d9ac9ef	[libc] disable process_mrelease for riscv (#117956 ) `process_mrelease` upsets the RV32 build bot. Disable it for now.	2024-11-27 21:17:38 -05:00
Joseph Huber	054f914741	[Runtimes] Merge 'compile_commands.json' files from runtimes build (#116303 ) Summary: When building a project in a runtime mode, the compilation database is a separate CMake invocation. So its `compile_commands.json` file will be placed elsewhere in the `runtimes/runtime-bins` directory. This is somewhat annoying for ongoing development when a runtimes build is necessary. This patch adds some CMake magic to merge the two files.	2024-11-27 20:14:26 -06:00
Joseph Huber	a24aa7dfa5	[Offload] Use libc 'hand-in-hand' module to find RPC header (#117928 ) Summary: We should now use the official™ way to include the files from `libc/shared`. This required some code to make sure that it's not included twice if multiple people use it as well as a sanity check on the directory.	2024-11-27 20:14:13 -06:00
LiqinWeng	4a3f46de50	[LV][EVL] Support call instruction with EVL-vectorization (#110412 )	2024-11-28 10:05:08 +08:00
Schrodinger ZHU Yifan	819b155c2a	[libc] skip test and return ENOSYS when processm_release unavailable (#117951 )	2024-11-27 20:52:16 -05:00
Haohai Wen	c8cd497c98	[Driver] Support fprofile-sample-use= for CL (#117282 ) Sampling PGO has already been supported on Windows. This patch adds /fprofile-sample-use= /fprofile-sample-use: /fno-profile-sample-use and supports -fprofile-sample-use= for CL.	2024-11-28 09:33:24 +08:00
A. Jiang	63c5a422f0	[Clang] Fix constexpr-ness on implicitly deleted destructors (#116359 ) In C++20, a defaulted but implicitly deleted destructor is constexpr if and only if the class has no virtual base class. This hasn't been changed in C++23 by P2448R2. Constexpr-ness on a deleted destructor affects almost nothing. The `__is_literal` intrinsic is related, while the corresponding `std::is_literal_type(_v)` utility has been removed in C++20. A recently added example in `test/AST/ByteCode/cxx23.cpp` will become valid, and the example is already accepted by GCC. Clang currently behaves correctly in C++23 mode, because the constexpr-ness on defaulted destructor is relaxed by P2448R2. But we should make similar relaxation for an implicitly deleted destructor. Fixes #85550.	2024-11-28 09:19:02 +08:00
Omar Hossam	d2b482b0ef	[libc] (reland #117503 ) Implement process_mrelease (#117851 ) This PR implements process_mrelease. A previous PR was merged #117503, but failed on merge due to an issue in the tests. Namely the failing tests were comparing against return type as opposed to errno. This is fixed in this PR.	2024-11-27 20:15:17 -05:00
Stella Laurenzo	65339e4d74	[mlir] Add option to disable MLIR Python dev package configuration. (#117934 ) Adds a CMake option MLIR_DISABLE_CONFIGURE_PYTHON_DEV_PACKAGES which gates doing package discovery and configuration for Python dev packages by MLIR (this was made opt-out to preserve compatibility with find_package(MLIR) based uses which do not set the standard options). The default Python setup that MLIR does has been a problem for super-projects that include LLVM for a long time because it forces a very specific package discovery mechanism that is not uniform in all uses. When reviewing #117922, I noted that this would effectively be a break the world event for downstreams, forcing them to adapt their nanobind dep to the exact way that MLIR does it. Adding the option to just wholesale skip the built-in configuration heuristics at least gives us a mechanism to tell downstreams to migrate to, giving them complete control and not requiring packaging workarounds. This seemed a better option than (once again) creating a situation where downstreams could not integrate the dep change without doing tricky infra upgrades, and it removes the burden from the author of that patch from needing to think about how this affects super-projects that include MLIR (i.e. they can just be told to do it themselves as needed vs being in a wedged state and unable to upgrade).	2024-11-27 17:11:32 -08:00
abhishek-kaushik22	9bdf683ba6	[X86] Enforce strict pre-legalization to combine in scalarizeExtEltFP (#117681 ) Use a `DCI` object to actually check the DAG combine level instead of using the type `i1` because this assumption fails on AVX512 where we have types like `v8i1` after legalization. Closes #117684	2024-11-28 08:19:10 +08:00
Yusuke MINATO	e573c6b67e	[flang] Add nsw to DO loop parameters (#113854 ) nsw is added to DO loop parameters (initial parameters, terminal parameters, and incrementation parameters). This can help vectorization in some cases like #110609. See also the discussion in https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584/20.	2024-11-28 08:58:09 +09:00
Maurice Heumann	21af99ab84	[WinEH] Emit state stores for SEH scopes (#116546 ) At the moment Windows 32 bit SEH state stores are only emitted for throwing calls. Windows 32 bit SEH state stores should also be emitted before SEH scope begin and before SEH scope end. An invalid inline memory access would otherwise not trigger unwinding, in combination with /EHa. This fixes #90946	2024-11-27 15:43:20 -08:00
Pranav Kant	8df63211a6	[BitstreamReader] Fix 32-bit overflow (#117363 ) This got exposed when processing large LTO-generated files leading to crashes.	2024-11-27 14:53:34 -08:00
Craig Topper	80afdbe6a5	[RISCV] Use RISCVSubtarget::is64Bit() instead of hasFeature(RISCV::Feature64Bit). NFC	2024-11-27 14:02:15 -08:00
Joseph Huber	4cb4516ae9	[OpenMP] Fix RPC client not being optimized out after changes Summary: I forgot that this check deliberately looked through the indirection I removed. Fix it to just check if the symbol has no users.	2024-11-27 15:56:23 -06:00
Philip Reames	c6f2d35c4d	Fix a build warning introduce by my febbf910	2024-11-27 13:41:29 -08:00
Felipe Magno de Almeida	e3fdc3aa81	[RISCV] Allow hoisting VXRM writes out of loops speculatively (#110044 ) Change the intersect for the anticipated algorithm to ignore unknown when anticipating. This effectively allows VXRM writes speculatively because it could do a VXRM write even when there's branches where VXRM is unneeded. The importance of this change is because VXRM writes causes pipeline flushes in some micro-architectures and so it makes sense to allow more aggressive hoisting even if it causes some degradation for the slow path. An example is this code: ``` typedef unsigned char uint8_t; __attribute__ ((noipa)) void foo (uint8_t dst, int i_dst_stride, uint8_t src1, int i_src1_stride, uint8_t *src2, int i_src2_stride, int i_width, int i_height ) { for( int y = 0; y < i_height; y++ ) { for( int x = 0; x < i_width; x++ ) dst[x] = ( src1[x] + src2[x] + 1 ) >> 1; dst += i_dst_stride; src1 += i_src1_stride; src2 += i_src2_stride; } } ``` With this patch, the code above generates a hoisting VXRM writes out of the outer loop.	2024-11-27 13:31:39 -08:00
Philip Reames	febbf9105f	[RISCV] Match vcompress during shuffle lowering (#117748 ) This change matches a subset of vcompress patterns during shuffle lowering. The subset implemented requires a contiguous prefix of demanded elements followed by undefs. This subset was chosen for two reasons: 1) which elements to spurious demand is a non-obvious problem, and 2) my first several attempts at implementing the general case were buggy. I decided to go with the simple case to start with. vcompress scales better with LMUL than a general vrgather, and at least the SpaceMit X60, has higher throughput even at m1. It also has the advantage of requiring smaller vector constants at one bit per element as opposed to vrgather which is a minimum of 8 bits per element. The downside to using vcompress is that we can't fold a vselect into it, as there is no masked vcompress variant. For reference, here are the relevant throughputs from camel-cdr's data table on BP3 (X60): vrgather.vv v8,v16,v24 4.0 16.0 64.0 256.0 vcompress.vm v8,v16,v24 3.0 10.0 36.0 136. vmerge.vvm v8,v16,v24,v0 2.0 4.0 8.0 16.0 The largest concern with the extra vmerge is that we locally increase register pressure. If we do have masking, we also have a passthru, without the ability to fold that into the vcompress, we need to keep it alive a bit longer. This can hurt at e.g. m8 where we have very few architectural registers. As compared with the vrgather.vv sequence, this is only one additional m1 VREG - since we no longer need the index vector. It compares slightly worse against vrgatherie16.vv which can use index vectors smaller than other operands. Note that we could potentially fold the vmerge if only tail elements are being preserved; I haven't investigated this. It is unfortunately hard given our current lowering structure to know if we're emitting a shuffle where masking will follow. Thankfully, it doesn't seem to show up much in practice, so I think we can probably ignore it. This patch only handles single source compress idioms at the moment. This is an effort to avoid interacting with other patches on review for changing how we canonicalize length changing shuffles.	2024-11-27 13:23:18 -08:00
lialan	1669ac434c	[MLIR] Refactor mask compression logic when emulating `vector.maskedload` ops (#116520 ) This patch simplifies and extends the logic used when compressing masks emitted by `vector.constant_mask` to support extracting 1-D vectors from multi-dimensional vector loads. It streamlines mask computation, making it applicable for multi-dimensional mask generation, improving the overall handling of masked load operations.	2024-11-27 13:22:13 -08:00
Joseph Huber	1d810ece2b	[libc] Move libc server handlers to a shared header (#117908 ) Summary: We can simply include this header from the shared directory now and do not need to have this level of indirection. Simply stash it with the other libc opcode handlers. If we were able to move the printf handlers to the shared directory then this could just be a header as well, which would HEAVILY simplify the mess associated with building the RPC server first in the projects build, then copying it to the runtimes build.	2024-11-27 14:57:52 -06:00
Joseph Huber	89d8e70031	[libc] Export a pointer to the RPC client directly (#117913 ) Summary: We currently have an unnecessary level of indirection when initializing the RPC client. This is a holdover from when the RPC client was not trivially copyable and simply makes it more complicated. Here we use the `asm` syntax to give the C++ variable a valid name so that we can just copy to it directly. Another advantage to this, is that if users want to piggy-back on the same RPC interface they need only declare theirs as extern with the same symbol name, or make it weak to optionally use it if LIBC isn't avaialb.e	2024-11-27 14:57:38 -06:00
Craig Topper	175051b05e	[RISCV][GISel] Support libcalls for f32/f64 acos/asin/atan/atan2/cosh/sinh/tanh.	2024-11-27 12:23:12 -08:00

... 3 4 5 6 7 ...

519947 Commits