https://reviews.llvm.org/D158247 caused regressions for HIP on Windows
and was reverted.
A reduced test case is:
```
typedef void (__stdcall* funcTy)();
void invoke(funcTy f);
static void __stdcall callee() noexcept {
}
void foo() {
invoke(callee);
}
```
It is due to clang missing handling host/device attributes for calling
convention at a few places
This patch fixes that.
This patch enables some fp16 vector type builtins that don't use fp arithmetic instruction for zvfhmin without zvfh.
Include following builtins:
vector load/store,
vector reinterpret,
vmerge_vvm,
vmv_v.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D151869
This intermediate result in moving internal symbolizer build
from sh script to CMake rules.
The flag is supposed to be used with:
-DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" -DLLVM_ENABLE_RUNTIMES="libunwind;libcxx;libcxxabi" -Sllvm-project/llvm
After converting sh script into cmake, we may add support for other build modes.
For https://github.com/llvm/llvm-project/issues/30098
Reviewed By: kstoimenov, MaskRay
Differential Revision: https://reviews.llvm.org/D157947
These 2 functions could be called by AsmPrinter::doInitialization in
AsmPrinter.cpp. doInitialization init MMI in the beginning`MMI = MMIWP ?
&MMIWP->getMMI() : nullptr;`, MMI has the possibility to be nullptr,
which could make the later deref crash. I think in most time MMI could
not be nullptr, but from the view of function implementation, it could
be, so I'd like to add assert to it, if this could be a problem, then we
could avoid crash.
As mentioned in TODOs from D159332. This PR doesn't actually
common up that copy of the code because doing so is not NFC - due to
DLEN. Fixing that will be a future PR.
This patch adds the clang portion of an AIX-specific option to inform
the compiler that it can use a faster access sequence for the local-exec
TLS model (formally named aix-small-local-exec-tls).
This patch only adds the frontend portion of the option, building upon:
Backend portion of the option (D156203)
Backend patch that utilizes this option to actually produce the faster access sequence (D155600)
Differential Revision: https://reviews.llvm.org/D155544
This patch utilizes the -maix-small-local-exec-tls option added in
D155544 to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided.
The patch either produces an addi/la with a displacement off of r13
(the thread pointer) when the address is calculated, or it produces an
addi/la followed by a load/store when the address is calculated and
used for further accesses.
This patch also optimizes this sequence a bit more where we can remove
the addi/la when the load/store offset is 0. A follow up patch will
be posted to account for when the load/store offset is non-zero, and
currently in these situations we keep the addi/la that precedes the
load/store.
Furthermore, this access sequence is only performed for TLS variables
that are less than ~32KB in size.
Differential Revision: https://reviews.llvm.org/D155600
This patch adds a target attribute for an AIX-specific option that
informs the compiler that it can use a faster access sequence for the
local-exec TLS model (formally named aix-small-local-exec-tls).
The Clang portion of this option is in D155544.
The initial implementation to generate the faster access sequence is in
D155600.
Differential Revision: https://reviews.llvm.org/D156203
This patch makes the generation of command lines for modular
dependencies lazy/on-demand. That operation is somewhat expensive and
prior to this patch used to be performed multiple times for the
identical `ModuleDeps` (i.e. when they were imported from multiple
different TUs).
Instead of a linear scan, use a bitset to track rarity of features. This
improves fuzzer throughput rather dramatically (close to 2x) in early
exploratory phases; in steady state this seems to improve fuzzing
throughput by ~15% according to perf.
The benchmarks are done on an executable with ~100k features, so the
results may change based on the executable that's being fuzzed.
kFeatureSetSize is 2M so the bitset is adding 256 KB to
sizeof(InputCorpus), but this should be fine since there's already three
arrays indexed by feature index for a total of 200 MB.
Switch from C++11 to C++14 as fuzzer requires std::chrono and stdlibc++
doesn't provide chrono literals when using -std=c++11.
Also remove 'u' from ar command to fix this warning: ar: `u' modifier
ignored since `D' is the default (see `U')
/Users/jiefu/llvm-project/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp:2753:34: error: cannot initialize a parameter of type 'int64_t' (aka 'long long') with an rvalue of type 'thread::id' (aka '_opaque_pthread_t *')
"_" + llvm::itostr(llvm::this_thread::get_id()) + "_" +
^~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/jiefu/llvm-project/llvm/include/llvm/ADT/StringExtras.h:315:35: note: passing argument to parameter 'X' here
inline std::string itostr(int64_t X) {
^
1 error generated.
If we have a build_vector such as [i64 0, i64 3, i64 1, i64 2], we
instead lower this as vsext([i8 0, i8 3, i8 1, i8 2]). For vectors with
4 or fewer elements, the resulting narrow vector can be generated via
scalar materialization.
For shuffles which get lowered to vrgathers, constant build_vectors of
small constants are idiomatic. As such, this change covers all shuffles
with an output type of 4 or less.
I deliberately started narrow here. I think it makes sense to expand
this to longer vectors, but we need a more robust profit model on the
recursive expansion. It's questionable if we want to do the zsext if
we're going to generate a constant pool load for the narrower type
anyways.
One possibility for future exploration is to allow the narrower VT to be
less than 8 bits. We can't use vsext for that, but we could use
something analogous to our widening interleave lowering with some extra
shifts and ands.
We create one `CompilerInvocation` for each modular dependency we
discover. This means we create a lot of copies, even though most of the
invocation is the same between modules. This patch makes use of the
copy-on-write flavor of `CompilerInvocation` to share the common parts,
reducing memory usage and speeding up the scan.
Update `ForceFunctionAttrs` pass to optionally take its input from a csv file, for example, function-level optimization attributes. A subsequent patch will enable the pass pipeline to be aware of these attributes, and this pass will be used to test that is the case. Eventually, the annotations would be driven by an agent, e.g. a machine learning-based policy.
This patch is a part of GSoC 2023, more details can be found [[ https://summerofcode.withgoogle.com/programs/2023/projects/T8rB84Sr | here ]]
Reviewed By: mtrofin, aeubanks
Differential Revision: https://reviews.llvm.org/D155617
The LLVM Dialect in MLIR, which the `mlir-llvm` team is supposed to
provide notifications for, is 98% not nested in a directory called LLVM
but rather LLVMIR. The former only contains some tests.
This should make PRs such as
https://github.com/llvm/llvm-project/pull/65508 add the team as
codeowner.
The %p format wasn't correctly passing along flags and modifiers to the
integer conversion behind the scenes. This patch fixes that behavior, as
well as changing the nullptr behavior to be a string conversion behind
the scenes.
Reviewed By: lntue, jhuber6
Differential Revision: https://reviews.llvm.org/D159458
Fix tests that are failing in cross-compilation after D151920
(https://lab.llvm.org/buildbot/#/builders/221/builds/17715):
- instrumentation-ind-call, basic-instrumentation: add -mno-outline-atomics flag to runtime lib
- bolt-address-translation-internal-call, internal-call-instrument: add %cflags
- meta-merge-fdata: restrict to x86_64
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D159094
This PR introduces new copy-on-write `CompilerInvocation` class
(`CowCompilerInvocation`), which will be used by the dependency scanner
to reduce the number of copies performed when generating command lines
for discovered modules.
Added support for following semantic check for MAP clause.
- A list item cannot appear in both a map clause and a data-sharing attribute clause on the same target construct.
Reviewed By: NimishMishra
Differential Revision: https://reviews.llvm.org/D158807
- For a long time I assumed that `inbounds` means "in-bounds of a *live*
allocation". @nikic told me that is not correct. I think this definitely
needs clarification in the docs.
- The point about successively adding the offsets to the current address
confused be because it talked about the successive addition of "an"
offset -- which one? My interpretation was, the total accumulated offset
computed in the previous step. But @nikic told me that's not correct,
adding each offset individually has to stay in-bounds for each step. I
hope by saying "each offset" this becomes more clear; I then also change
the previous bullet to use the same terminology.
Previously clang AST prints the following declaration:
int fun_var_unused() {
int x __attribute__((unused)) = 0;
return x;
}
and
int __declspec(thread) x = 0;
as:
int fun_var_unused() {
int x = 0 __attribute__((unused));
return x;
}
and
int x = __declspec(thread) 0;
which is rejected by C/C++ parser. This patch modifies the logic to
print old C attributes for variables as:
int __attribute__((unused)) x = 0;
and the __declspec case as:
int __declspec(thread) x = 0;
Fixes: https://github.com/llvm/llvm-project/issues/59973
Previous version: D141714.
Differential Revision:https://reviews.llvm.org/D141714
InferAddressSpaces failed to call its initialization function. It was
still called through initializeScalarOpts in llc and opt, but it was
skipped entirely in clang. When the initialization function is not
called, this results in confusing behavior where the pass appears to
run, but not entirely as it should, e.g. the pass is excluded from
-print-before-all and -print-after-all.
The trick we use (since cbc2a0623a39461b56bd9eeb308ca535f03793f8)
for exporting the __chkstk function (with various per-arch names)
that is defined in a different object file, relies on the function
already being linked in (by some function referencing it).
This function does end up referenced if there's a function that
allocates more than 4 KB on the stack. In most cases, it's referenced
somewhere, but in the case of builds with LLVM_LINK_LLVM_DYLIB
enabled (so most of the code resides in a separate libLLVM-<ver>.dll)
the only code in lli.exe is the lli tool specific code and the
mingw-w64 crt startup code. In the case of GCC based MinGW i386
builds with LLVM_LINK_LLVM_DYLIB, nothing else references it though.
Manually add a reference to the function to make sure it is linked
in (from libgcc or compiler-rt builtins) so that it can be exported.
This fixes one build issue encountered in
https://github.com/msys2/MINGW-packages/pull/18002.
Differential Revision: https://reviews.llvm.org/D159085
Both `TileOp` and `TileToScfForOp` use the tiling interface and the
`tileUsingSCFForOp` method. This duplication was introduced in
44cfea0279
as a way to retire `linalg::tileLinalgOp,` now there is not more need
for this duplication, and it seems that `tileOp` has more recent
changes, thus retire `TileToScfForOp.`
TableGen's lexer was unable to handle nested #ifndef when the outer
`#ifdef` / `#ifndef` scope is subject to skip. This was caused by returning
the canonicalized token when it should have returned the original one.
Fix#65100.
Differential Revision: https://reviews.llvm.org/D159236
TBD files now record minimum deplyoment versions and tapi interfaces
with apple system linker by a packed version encoding. Support mapping
between that and `VersionTuple`s.
When encountering braces, such as those of a designated initializer,
clang-format scans ahead to see what is contained within the braces. If
it found a statement, like an if-statement of for-loop, it would deem
the braces as not an initializer, but as a block instead.
However, this heuristic incorrectly included a preprocessor `#if` line
as an if-statement. This manifested in strange results and discrepancies
between `#ifdef` and `#if defined`.
With this patch, `if` is now ignored if it is preceeded by `#`.
Fixes most of https://github.com/llvm/llvm-project/issues/56685
This change makes widening act the same as equivalence checking. When the
analysis does not provide an answer regarding the equivalence of two distinct
values, the framework treats them as equivalent. This is an unsound choice that
enables convergence.
Differential Revision: https://reviews.llvm.org/D159355
libc uses SYS_prlimit64 (which takes a struct rlimit64) to implement
setrlimt and getrlimit (which take a struct rlimit). In 64-bit bits
systems this is not an issue since the members of struct rlimit64 and
struct rlimit are 64 bits long, however, in 32-bit systems the members
of struct rlimit are only 32 bits long, causing wrong values being
passed to SYS_prlimit64.
This patch changes rlim_t to be __UINT64_TYPE__ (which also changes
rlimit as a side-effect), fixing the problem of mismatching types in
the syscall.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D159104