Also stop buiding the bolt project on other platforms since bolt only
supports ELF.
(cherry picked from commit 148111fdcf0e807fe74274b18fcf65c4cff45d63)
Lowercase the name before calling MatchRegisterName(), to restore
support for using `%R3` instead of `%r3` and similar, matching the GNU
assembler.
Fixes https://github.com/llvm/llvm-project/issues/126786.
(cherry picked from commit f1252f539ca203a979d61b616186e9be9d612f96)
This PR adds LLVM 20.1.0 release notes that are related to the PowerPC
target.
---------
Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
The llvm.fake.use intrinsic is used to prevent certain values from being
optimized out for the benefit of debug info; it is not, however, a debug
or pseudo instruction itself and necessarily must not be treated as one,
since its purpose is to act like a normal instruction. In the original
commit that added them, the IR intrinsic however was treated as one in
`getPrevNonDebugInstruction` (but _not_ in `getNextNonDebugInstruction`,
or in the MIR equivalents). This patch correctly treats it as a
non-debug instruction.
(cherry picked from commit af68927a831c45b92248b1f6fc24d445be42dd91)
Code in the HexagonBitTracker checks for a specific register class when
processing sub-registers. A crash occurred due to a register class that
was not handled. The register class is
DoubleRegs_with_isub_hi_in_IntRegsLow8RegClassID, which is a class
formed by creating a register pair when one of the sub registers is a
Low8 integer register.
Fixes#128078
Patch by: Brendon Cahoon
(cherry picked from commit 4f7d8948d9d9a0d366ac737247abab2246834e05)
`JSONScopedPrinter` has a `std::unique_ptr<DelimitedScope>` member and
defaulted constructor argument, so it needs a complete type. This
resolves one of the many build errors with C++23 using Clang.
(cherry picked from commit e65d3882af6fcc15342451ad4f9494b1ba6b9b9d)
addInstantiatedCapturesToScope() might be called when transforming a
lambda body. In this situation, it would look into all the lambda's
parents and figure out all the instantiated captures. However, the
instantiated captures are not visible from lambda's class decl until the
lambda is rebuilt (i.e. after the lambda body transform). So this patch
corrects that by also examining the LambdaScopeInfo, serving as a
workaround for not having deferred lambda body instantiation in Clang
20, to avoid regressing some real-world use cases.
Fixes#128175
(cherry picked from commit ecc7e6ce4cd57a614985e95daf7027918cb8723e)
Pointers are already handled as taking up a register in the ABI
handling, but the handling for structs was not taking this into account.
This patch changes the struct handling to acknowledge that pointer
arguments take up an integer register.
Fixes#123075
(cherry picked from commit 449f84fea652e31de418c3087d7e3628809241b4)
`N` may get merged with existing nodes inside the loop. Early exit when
it is deleted to avoid the crash.
Alternative solution: use `DAGNodeDeletedListener` to refresh the value
of N.
Closes https://github.com/llvm/llvm-project/issues/128143.
(cherry picked from commit 646e4f2eede9a39e46012dde9430cd289682e83c)
Close https://github.com/llvm/llvm-project/issues/127963
The root cause of the problem seems to be that we didn't realize it
simply.
(cherry picked from commit 24c06a19be7bcf28b37e5eabbe65df95a2c0265a)
Summary:
This is supposed to be `__llvm_rpc_client` but I screwed it up and
didn't notice at the time. Will need to be backported.
(cherry picked from commit b35749559ddd9b2d4e044ef71d13d888b8a3d8cb)
MLIR_MAIN_SRC_DIR and MLIR_INCLUDE_DIR point to the source directory,
which is not installed. As such, the installed MLIRConfig.cmake also
should not reference it.
The comment indicates that these are needed for mlir_tablegen(), but I
don't see any related uses.
The motivation for this is the use in flang, where we end up inheriting
a meaningless MLIR_MAIN_SRC_DIR from a previous MLIR build, whose source
directory doesn't exist anymore, and that cannot be overridden with the
correct path, because it's not a cached variable.
Instead do what all the other projects do for LLVM_MAIN_SRC_DIR and
initialize MLIR_MAIN_SRC_DIR to CMAKE_CURRENT_SOURCE_DIR/../mlir.
For MLIR_INCLUDE_DIR there already is an exported MLIR_INCLUDE_DIRS,
which can be used instead.
(cherry picked from commit 82bd148a3f25439d7f52a32422dc1bcd2da03803)
When checking for dependecies for gather nodes with users with the same
last instruction, cannot rely on the index order, if there is (even
potential!) cycle in the graph, which may cause order not work correctly
and cause compiler crash.
Fixes#127128
(cherry picked from commit ac217ee389d63124432e5e6890851a678f7a676b)
Unfortunately we only have the vector versions of v2f16 minimum3
and maximum. Widen to v2f16 so we can lower as minimum333(x, y, y).
(cherry picked from commit e729dc759d052de122c8a918fe51b05ac796bb50)
Summary:
The scan operation implemented here only works if there are contiguous
ones in the executation mask that can be used to propagate the result.
There are two solutions to this, one is to enter 'whole-wave-mode' and
forcibly turn them back on, or to do this serially. This implementation
does the latter because it's more portable, but checks to see if the
parallel fast-path is applicable.
Needs to be backported for correct behavior and because it fixes a
failing libc test.
(cherry picked from commit 6cc7ca084a5bbb7ccf606cab12065604453dde59)
Many parts of the locale base API are only required when building the
shared/static library, but not from the headers. Document those
functions and carve out a few of those that don't work when
_XOPEN_SOURCE is defined to something old.
Fixes#117630
(cherry picked from commit f00b32e2d0ee666d32f1ddd0c687e269fab95b44)
Add support for sm101 and sm120 target architectures. It requires CUDA
12.8.
---------
Co-authored-by: Sebastian Jodlowski <sjodlowski@nuro.ai>
(cherry picked from commit 0127f169dc8e0b5b6c2a24f74cd42d9d277916f6)
#117700 made a change from analyzing all the candidates to analyzing
just the first candidate before deciding to either delete or keep all of
them.
Even though the candidates all have the same instructions, the basic
blocks in which they are present are different and we will need to check
each of them before deciding whether to keep or erase them.
Particularly, `isAvailableAcrossAndOutOfSeq` checks to see if the
register (x5 in this case) is available from the end of the MBB to the
beginning of the candidate and not checking this for each candidate led
to incorrect candidates being outlined resulting in correctness issues
in a few downstream benchmarks.
Similarly, deleting all the candidates if the first one is not viable
will result in missed outlining opportunities.
(cherry picked from commit 6757cf4e6f1c7767d605e579930a24758c0778dc)
This fixes instantiation of definition for friend function templates,
when the declaration found and the one containing the definition have
different template contexts.
In these cases, the the function declaration corresponding to the
definition is not available; it may not even be instantiated at all.
So this patch adds a bit which tracks which function template
declaration was instantiated from the member template. It's used to find
which primary template serves as a context for the purpose of
obtainining the template arguments needed to instantiate the definition.
Fixes#55509
As exposed by #125094, we are missing cost computation for some
binary VPInstructions we created based on original IR instructions.
Their cost should be considered.
PR: #125434
Author: Florian Hahn <flo@fhahn.com>
Change-Id: Icf985b3f1cd40898a17faaf47b241e2651f9e8dd
When bytes with negative signed char values appear in the data, make
sure to use raw bytes from the data string when preprocessing, not char
values.
Fixes https://github.com/llvm/llvm-project/issues/102798
This change will cause clang and the other tools to statically link
against the runtimes built in stage1. This will make the built binaries
more portable by eliminating dependencies on system libraries like
libgcc and libstdc++.
(cherry picked from commit f5b311e47de044160aeb25221095898c35c4847f)
These cannot be static compile errors, and should be treated as
poison. Invalid casts may be introduced which are dynamically dead.
For example:
```
void foo(volatile generic int* x) {
__builtin_assume(is_shared(x));
*x = 4;
}
void bar() {
private int y;
foo(&y); // violation, wrong address space
}
```
This could produce a compile time backend error or not depending on
the optimization level. Similarly, the new test demonstrates a failure
on a lowered atomicrmw which required inserting runtime address
space checks. The invalid cases are dynamically dead, we should not
error, and the AtomicExpand pass shouldn't have to consider the details
of the incoming pointer to produce valid IR.
This should go to the release branch. This fixes broken -O0 compiles
with 64-bit atomics which would have started failing in
1d03708.
(cherry picked from commit 18ea6c9)
Some configurations define __AMDGPU__ or __NVPTX__ on platforms that
don't provide <features.h>, such as CUDA on Mac.
(cherry picked from commit 2c8b1248513624e89b510397224f0f405116f3d3)
Class templates might be only instantiated when they are required to be
complete, but checking the template args against the primary template is
immediate.
This result is cached so that later when the class is instantiated,
checking against the primary template is not repeated.
The 'MatchedPackOnParmToNonPackOnArg' flag is also produced upon
checking against the primary template, so it needs to be cached in the
specialziation as well.
This fixes a bug which has not been in any release, so there are no
release notes.
Fixes#125290
…ete decl chains until the end of `finishPendingActions`. (#121245)"
This reverts commit a9e249f64e800fbb20a3b26c0cfb68c1a1aee5e1.
Reverting this change because of issue #126973.
(cherry picked from commit 912b154f3a3f8c3cebf5cc5731fd8b0749762da5)
The SDAG version uses fminnum/fmaxnum, in converting it to fcmp+select
it appears the order of the operands was chosen badly. This switches the
conditions used to keep the constant on the RHS.
(cherry picked from commit 70ed381b1693697dec3efcaed161d3626d16cff1)
The corresponding feature was implemented in LLVM 18 (by #67799), but
this FTM wasn't added before.
(cherry picked from commit 2207e3e32549306bf563c6987f790cabe8d4ea78)
The flang build was taking 2-3 hours and causing the entire job to
timeout, so we need to disable it.
(cherry picked from commit 3e5ae5777d92b6f8c647c3f6969fbca0f0f769ff)
All non-existing local times in a contiguous range should map to the
same time point. This fixes a bug, were the times inside the range were
mapped to the wrong time.
Fixes: #113654
(cherry picked from commit 941f7cbf5a3e7aa9f36b002dc22cfdb4ff50fea8)
Removes MnemonicAliases added for instructions available with
the LSUI feature (e.g. CAS -> CAST) which are not equivalent.
The aliases stt[add|clr|set]a & stt[add|clr|set]al are also removed.
(cherry picked from commit d44d806faa879dfb7a7ceb58beeb57cf8d5af430)
This partially reverts commit 5f2389d4. That commit started checking
whether <features.h> was a valid include unconditionally, however codebases
are free to have such a header on their search path, which breaks compilation.
LLVM libc now provides a more standard way of getting configuration macros
like __LLVM_LIBC__.
After this patch, we only include <features.h> when we're on Linux or
when we're compiling for GPUs.
(cherry picked from commit cffc1ac3491c891ef4f80bcbfa685710e477eeac)
It turns out that the new implementation takes significantly more stack
memory for some reason.
This reverts commit 2696e4fb9567d23ce065a067e7f4909b310daf50.
(cherry picked from commit 0227396417d4625bc93affdd8957ff8d90c76299)
There are additional wait states for XDL write VALU WAW hazard in gfx950
compared to gfx940.
(cherry picked from commit 1188b1ff7b956cb65d8ddda5f1e56c432f1a57c7)
This fixes a false positive caused by #114044.
For `GSLPointer*` types, it's less clear whether the lifetime issue is
about the GSLPointer object itself or the owner it points to. To avoid
false positives, we take a conservative approach in our heuristic.
Fixes#127195
(This will be backported to release 20).
(cherry picked from commit 9c49b188b8e1434eb774ee8422124ad3e8870dce)
These targets don't include all OpenCL builtins, so there will always be
external calls in the final bytecode module.
Fixes#127316.
(cherry picked from commit 9fec0a0942f5a11f4dcfec20aa485a8513661720)
After #117558 landed, this code would assert "Value is not an N-bit
unsigned value" in getConstant(), from a test case in zig.
Co-authored-by: Craig Topper <craig.topper@sifive.com>
Fixes#127296
(cherry picked from commit 788cb725d8b92a82e41e64540dccca97c9086a58)