llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-26 12:56:08 +00:00

Author	SHA1	Message	Date
Maksim Panchenko	ec2fb59e6c	[BOLT][docs] Add Linux kernel optimization guide (#96669 ) Describe steps for optimizing the Linux kernel with BOLT.	2024-06-25 12:09:04 -07:00
Nivetha Kuruparan	a255ece56f	XFAIL llvm/test/DebugInfo/attr-btf_type_tag.ll on AIX (#96677 ) This PR XFAILS `llvm/test/DebugInfo/attr-btf_type_tag.ll` on AIX since we we don’t have `.debug_addr` section. Co-authored-by: Nivetha Kuruparan <nivetha@comp810.rtp.raleigh.ibm.com>	2024-06-25 15:03:42 -04:00
Felix Schneider	b003c60904	[mlir][arith] Match folding of `arith.remf` to `llvm.frem` semantics (#96537 ) There are multiple ways to define a remainder operation. Depending on the definition, the result could be either always positive or have the sign of the dividend. The pattern lowering `arith.remf` to LLVM assumes that the semantics match `llvm.frem`, which seems to be reasonable. The folder, however, is implemented via `APFloat::remainder()` which has different semantics. This patch matches the folding behaviour to lowering behavior by using `APFloat::mod()`, which matches the behavior of `llvm.frem` and libm's `fmod()`. It also updates the documentation of `arith.remf` to explain this behavior: The sign of the result of the remainder operation always matches the sign of the dividend (LHS operand). frem documentation: https://llvm.org/docs/LangRef.html#frem-instruction Fix https://github.com/llvm/llvm-project/issues/94431 --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2024-06-25 21:01:18 +02:00
agozillon	aec735cf47	[Flang][OpenMP][MLIR] Fix common block mapping for regular and declare target link (#91829 ) This PR attempts to fix common block mapping for regular mapping of these types as well as when they have been marked as "declare target link". This PR should allow correct mapping of both the members of a common block and the full common block via its block symbol. The main changes were some adjustments to the Fortran OpenMP lowering to HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and adjustments to the way the we handle target kernel map argument rebinding inside of the OMPIRBuilder. For the Fortran OpenMP lowering were two changes, one to prevent the implicit capture of common block members when the common block symbol itself has been marked and the other creates intermediate member access inside of the target region to be used in-place of those external to the target region, this prevents external usages breaking the IsolatedFromAbove pact. In the latter case, there was an adjustment to the size calculation for types to better handle cases where we pass an array as the type of a map (as opposed to the bounds and the type of the element), which occurs in the case of common blocks. There is also some adjustment to how handleDeclareTargetMapVar handles renaming of declare target symbols in the module to the reference pointer, now it will only apply to those within the kernel that is currently being generated and we also perform a modification to replace constants with instructions as necessary as we cannot replace these with our reference pointer (non-constant and constants do not mix nicely). In the case of the OpenMPIRBuilder some changes were made to defer global symbol rebinding to kernel arguments until all other arguments have been rebound. This makes sure we do not replace uses that may refer to the global (e.g. a GEP) but are themselves actually a separate argument that needs bound. Currently "declare target to" still needs some work, but this may be the case for all types in conjunction with "declare target to" at the moment.	2024-06-25 20:54:04 +02:00
Kazu Hirata	fef144cebb	Revert "[llvm] Use llvm::sort (NFC) (#96434 )" This reverts commit 05d167fc201b4f2e96108be0d682f6800a70c23d. Reverting the patch fixes the following under EXPENSIVE_CHECKS: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir LLVM :: CodeGen/PowerPC/aix-xcoff-used-with-stringpool.ll LLVM :: CodeGen/PowerPC/merge-string-used-by-metadata.mir LLVM :: CodeGen/PowerPC/mergeable-string-pool-large.ll LLVM :: CodeGen/PowerPC/mergeable-string-pool-pass-only.mir LLVM :: CodeGen/PowerPC/mergeable-string-pool.ll	2024-06-25 11:18:40 -07:00
Joseph Huber	b9353f7f3e	[LinkerWrapper][NFC] Simplify StringErrors (#96650 ) Summary: The StringError class has a specialized method that creates the inconvertible error code for you. It's much easier to read this way.	2024-06-25 13:16:28 -05:00
Michael Buch	21ab32e1c1	[lldb][LibCxx] Move incorrect nullptr check (#96635 ) Found while skimming this code. Don't have a reproducible test case for this but the nullptr check should clearly occur before we try to dereference `location_sp`.	2024-06-25 19:05:16 +01:00
Aaron Ballman	a0869331ec	[C11] Remove WG14 N1537 from the status page This paper was a rewording of WG14 N1485, correcting terminology and bringing the C11 feature slightly closer in line with the C++11 feature. There is nothing additional to be done or test to conform to what was specified by WG14 N1537, so we'll remove the entry and lean on N1485 to track status for atomics.	2024-06-25 14:02:05 -04:00
Alexis Perry-Holby	a790279bf2	[flang] Add basic -mtune support (#95043 ) This PR adds -mtune as a valid flang flag and passes the information through to LLVM IR as an attribute on all functions. No specific architecture optimizations are added at this time.	2024-06-25 18:39:35 +01:00
Brendan Dahl	928b780840	[WebAssembly] Implement trunc_sat and convert instructions for f16x8. (#95180 ) These instructions can be generated using regular LL intrinsics. Specified at: `29a9b9462c/proposals/half-precision/Overview.md`	2024-06-25 10:39:05 -07:00
Eli Friedman	39a0aa5876	[SelectionDAG] Lower llvm.ldexp.f32 to ldexp() on Windows. (#95301 ) This reduces codesize. As discussed in #92707.	2024-06-25 10:25:48 -07:00
Vitaly Buka	c0dc134de5	[tsan] Lock/Unlock allocator and stacks on fork (#96600 ) We do that for other Sanitizers, and we should do the same for TSAN. There are know deadlocks reports here.	2024-06-25 10:05:25 -07:00
Fabian Mora	70fb1e379b	Reland [mlir][Target] Improve ROCDL gpu serialization API (#96198 ) Reland: https://github.com/llvm/llvm-project/pull/95456 This patch improves the ROCDL gpu serialization API by: - Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN device code libraries to use during linking. - Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`. Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode librariesm now it will only load the requested libraries. - Exposing the `compileToBinary` method and making it virtual, allowing downstream users to re-use this method. - Exposing `moduleToObjectImpl`, this method provides a prototype flow for compiling to binary, allowing downstream users to re-use this method. - It also avoids constructing the control variables if no device libraries are being used. - Changes the style of the error messages to be composable, ie no full stops. - Adds an error message for when the ROCm toolkit can't be found but it was required.	2024-06-25 12:05:11 -05:00
Alex MacLean	5c9513ac75	[NVPTX] cap param alignment at 128 (max supported by ptx) (#96117 ) Cap the alignment to 128 bytes as that is the maximum alignment supported by PTX. The restriction is mentioned in the parameter passing section (Note D) of the [PTX Writer's Guide to Interoperability] (https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#parameter-passing) > D. The alignment must be 1, 2, 4, 8, 16, 32, 64, or 128 bytes.	2024-06-25 10:04:45 -07:00
Vitaly Buka	0258a60cd9	[nfc][tsan] Clang format includes (#96599 )	2024-06-25 10:03:12 -07:00
Vitaly Buka	cd2bac81a9	[nfc][tsan] Better name for locking functions (#96598 ) These functions used only for `fork`. Unused parameter `child` will be used in followup patches.	2024-06-25 10:02:01 -07:00
Nick Desaulniers (paternity leave)	4c87212d63	[libc][thumb] support syscalls from thumb mode (#96558 ) r7 is reserved in thumb2 (typically for the frame pointer, as opposed to r11 in ARM mode), so assigning to a variable with explicit register storage in r7 will produce an error. But r7 is where the Linux kernel expects the syscall number to be placed. We can use a temporary to get the register allocator to pick a temporary, which we save+restore the previous value of r7 in. Fixes: #93738	2024-06-25 09:58:50 -07:00
Vitaly Buka	0b049ce646	[tsan] Test `__tsan_test_only_on_fork` only on Mac (#96597 ) According to https://reviews.llvm.org/D114250 this was to handle Mac specific issue, however the test is Linux only. The test effectively prevents to lock main allocator on fork, but we do that on Linux for other sanitizers for years, and need to do the same for TSAN to avoid deadlocks.	2024-06-25 09:58:32 -07:00
Aaron Ballman	5e2beed9a1	[C23] Move WG14 N2931 to the TS18661 section This paper only matters for TS18661-3 integration.	2024-06-25 12:46:41 -04:00
Jay Foad	aaf50bf34f	[AMDGPU] Disallow negative s_load offsets in isLegalAddressingMode (#91327 )	2024-06-25 17:43:00 +01:00
Han-Kuan Chen	de7c1396f2	[SLP] NFC. Refactor and add getAltInstrMask help function. (#94709 ) Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2024-06-26 00:42:38 +08:00
Vitaly Buka	f0f774ebf0	[sanitizer] Rename DEFINE_REAL_PTHREAD_FUNCTIONS (#96527 ) We use REAL() calls in interceptors, but DEFINE_REAL_PTHREAD_FUNCTIONS has nothing to do with them and only used for internal maintenance threads. This is done to avoid confusion like in #96456.	2024-06-25 09:42:01 -07:00
PeterChou1	dbd5c7805b	[clang-doc] Remove stdexecpt from clang-doc test (#96552 ) Removes stdexecpt from clang-doc test introduced in https://github.com/llvm/llvm-project/pull/93928 since it violates the rule that test must be freestanding	2024-06-25 09:40:58 -07:00
PeterChou1	d7dd778cde	[clang-doc] update install path to share/clang-doc instead of share/clang (#96555 ) Updates the install path for clang-doc to share/clang-doc instead share/clang to avoid confusion	2024-06-25 09:39:33 -07:00
Aaron Ballman	05ca207441	[C23] Update status page regarding FLT_MAX_EXP N2843 was subsumed by N2882; we could probably consider removing subsumed entries, but I've been leaving them to help folks looking at the editor's report from various working drafts and wondering about the changes.	2024-06-25 12:34:34 -04:00
Max191	c9529f7601	[mlir] Drop outermost dims in slice rank reduction inference (#95020 ) The `getDroppedDims` utility function does not follow the convention of dropping outermost unit dimensions first when inferring a rank reduction mask for a slice. This PR updates the implementation to match this convention.	2024-06-25 12:33:02 -04:00
Timm Bäder	580343d96f	[clang][Interp][NFC] Destroy InitMap when moving contents to DeadBlock	2024-06-25 18:32:12 +02:00
Stanley Winata	ac1e22f305	[mlir][vector] Generalize folding of ext-contractionOp to other types. (#96593 ) Many state of the art models and quantization operations are now directly working on vector.contract on integers. This commit enables generalizes ext-contraction folding S.T we can emit more performant vector.contracts on codegen pipelines. Signed-off-by: Stanley Winata <stanley.winata@amd.com>	2024-06-25 09:29:43 -07:00
yonghong-song	fb07afedbe	[BPF] Avoid potential long compilation time without -g (#96575 ) Alastair Robertson reported a huge compilation time increase without -g for bpf target when comparing to x86 ([1]). In my setup, with '-O0', for x86, a large basic block compilation takes 0.19s while bpf target takes 2.46s. The top function which contributes to the compile time is eliminateFrameIndex(). Such long compilation time without -g is caused by commit 05de2e481811 ("[bpf] error when BPF stack size exceeds 512 bytes") The compiler tries to get some debug loc by iterating all insns in the basic block which will be used when compiler warns larger-than-512 stack size. Even without -g, such iterating also happens which cause unnecessary compile time increase. To fix the issue, let us move the related code when the compiler is about to warn stack limit violation. This fixed the compile time regression, and on my system, the compile time is reduced from 2.46s to 0.35s. [1] https://github.com/bpftrace/bpftrace/issues/3257 Co-authored-by: Yonghong Song <yonghong.song@linux.dev>	2024-06-25 09:27:18 -07:00
Nick Desaulniers (paternity leave)	dca49d739d	[libc][arm32] define argc type and stack alignment (#96367 ) https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#6212stack-constraints-at-a-public-interface mentions that the stack on ARM32 is double word aligned. Remove confused comments around ArgcType. argc is always an int, passed on the stack, so we need to store a pointer to it (regardless of ILP32 or LP64).	2024-06-25 09:04:19 -07:00
Vy Nguyen	e951bd0f51	Reapply PR/87550 (again) (#95571 ) New fixes: - properly init the `std::optional<std::vector>` to an empty vector as opposed to `{}` (which was effectively `std::nullopt`). --------- Co-authored-by: Vy Nguyen <oontvoo@users.noreply.github.com>	2024-06-25 12:01:17 -04:00
Timm Bäder	b7768c5485	[clang][Interp][NFC] Use delegate() to delegate to only initlist item	2024-06-25 17:50:53 +02:00
Matt Arsenault	889f3c5741	AMDGPU: Handle legal v2bf16 atomicrmw fadd for gfx12 (#95930 ) Annoyingly gfx90a/940 support this for global/flat but not buffer.	2024-06-25 17:45:34 +02:00
Jakub Mazurkiewicz	bb075eeb89	[libc++] LWG3382: NTTP for `pair` and `array` (#85811 ) Mark LWG3382 as "Nothing To Do" and add tests.	2024-06-25 10:43:15 -05:00
Akira Hatanaka	2604830aac	Add support for __builtin_verbose_trap (#79230 ) The builtin causes the program to stop its execution abnormally and shows a human-readable description of the reason for the termination when a debugger is attached or in a symbolicated crash log. The motivation for the builtin is explained in the following RFC: https://discourse.llvm.org/t/rfc-adding-builtin-verbose-trap-string-literal/75845 clang's CodeGen lowers the builtin to `llvm.trap` and emits debugging information that represents an artificial inline frame whose name encodes the category and reason strings passed to the builtin.	2024-06-25 08:33:05 -07:00
Nikolas Klauser	731db06a87	[libc++] Get the GCC build mostly clean of warnings (#96604 ) The GCC build has gotten to the point where it's often hard to find the actual error in the build log. We should look into enabling these warnings again in the future, but it looks like a lot of them are bogous.	2024-06-25 17:31:41 +02:00
shawbyoung	902952ae04	Revert "[𝘀𝗽𝗿] initial version" This reverts commit bb5ab1ffe719f5e801ef08ac08be975546aa3266.	2024-06-25 08:30:29 -07:00
Xiaoyang Liu	8c11d3788c	[libc++] P3029R1: Better `mdspan`'s CTAD - `std::extents` (#89015 ) This patch implements an improvement introduced in P3029R1 that was missed in #87873. It adds a deduction of static extents if integral_constant-like constants are passed to `std::extents`.	2024-06-25 10:20:14 -05:00
Lukacma	8a46bbbc22	[Clang] Remove preprocessor guards and global feature checks for NEON (#95224 ) To enable function multi-versioning (FMV), current checks which rely on cmd line options or global macros to see if target feature is present need to be removed. This patch removes those for NEON and also implements changes to NEON header file as proposed in [ACLE](https://github.com/ARM-software/acle/pull/321).	2024-06-25 17:19:42 +02:00
Craig Topper	dddef9d1c9	[RISCV] Add FPR16 regbank and start legalizing f16 operations for Zfh. (#96582 )	2024-06-25 08:18:37 -07:00
Vitaly Buka	7f10ed637e	[tsan] Fix dead lock when starting StackDepot thread (#96456 ) Sometime tsan runtimes calls, like `__tsan_mutex_create ()`, need to store a stack in the StackDepot, and the Depot may need to start and maintenance thread. Example: ``` __sanitizer::FutexWait () __sanitizer::Semaphore::Wait () __sanitizer::Mutex::Lock () __tsan::SlotLock () __tsan::SlotLocker::SlotLocker () __tsan::Acquire () __tsan::CallUserSignalHandler () __tsan::ProcessPendingSignalsImpl () __tsan::ProcessPendingSignals () __tsan::ScopedInterceptor::~ScopedInterceptor () ___interceptor_mmap () pthread_create () __sanitizer::internal_start_thread () __sanitizer::(anonymous namespace)::CompressThread::NewWorkNotify () __sanitizer::StackDepotNode::store () __sanitizer::StackDepotBase<__sanitizer::StackDepotNode, 1, 20>::Put () __tsan::CurrentStackId () __tsan::MutexCreate () __tsan_mutex_create () ``` pthread_create() implementation may hit other interceptors recursively, which may invoke ProcessPendingSignals, which deadlocks. Alternative solution could be block interceptors closer to TSAN runtime API function, like `__tsan_mutex_create`, or just before `StackDepotPut``, but it's not needed for most calls, only when new thread is created using `real_pthread_create`. I don't see a reasonable way to create a regression test.	2024-06-25 08:17:10 -07:00
Hui	79e8a59523	[libc++] Move allocator assertion into allocator_traits (#94750 ) There is code duplication in all containers that static_assert the allocator matches the allocator requirements in the spec. This check can be moved into a more centralised place.	2024-06-25 10:13:48 -05:00
Nikita Popov	0e11a7e717	[EarlyCSE] Add test with noundef load of undef (NFC)	2024-06-25 17:10:43 +02:00
shawbyoung	c097e643ef	Revert "Added opts::Lite to RewriteInstance" This reverts commit 020f69cd10a2ff1233cc28088989319e5a58b116.	2024-06-25 08:07:45 -07:00
shawbyoung	020f69cd10	Added opts::Lite to RewriteInstance	2024-06-25 08:05:29 -07:00
shawbyoung	bb5ab1ffe7	[𝘀𝗽𝗿] initial version Created using spr 1.3.4	2024-06-25 08:05:29 -07:00
RichardLuo	ed1273d4dd	[libc++] change the visibility of libc++ header to public in libcxx module (#91240 ) This PR addresses a problem that headers may not be able to be found if `#include` is used with std modules. Consider the following file: #include <boost/json.hpp> import std; int main(int, const char **) { } Boost will include something from libc++, but we are using -nostdinc++ at [1] so the compiler can not find any default std header. Therefore the locally built header needs to be public. [1]: `15fdd47c4b/libcxx/modules/CMakeLists.txt.in (L52)`	2024-06-25 09:57:53 -05:00
Nikolas Klauser	2274c66e6f	[libc++] Use _If for conditional_t (#96193 ) This avoids different instantiations when the if and else types are different, resulting in reduced memory use by the compiler.	2024-06-25 16:53:17 +02:00
David Sherwood	ec9ce89a08	[LoopVectorize] Fix build issue caused by #95920 (#96647 )	2024-06-25 15:51:32 +01:00
bwlodarcz	a4045299d3	[SPIRV] Add definitions for NonSemantic debug info (#95530 ) This commit adds basic types and definitions for NonSemantic.Shader.DebugInfo.100 standard for SPIRV. Full implementation of the standard will allow SPIRV backend to emit files with debug info included. Link to standard: https://github.com/KhronosGroup/SPIRV-Registry/blob/main/nonsemantic/NonSemantic.Shader.DebugInfo.100.html	2024-06-25 07:49:55 -07:00

1 2 3 4 5 ...

502919 Commits