Introducing xor key to derive unmangled sp is here to follow the way
that the glibc adds support for pointer mangling on loongarch in commit
1c9bc1b6e50293a1b7037a7bfbf835868a55baed.
Reviewed By: SixWeining, wangleiat, xen0n
Differential Revision: https://reviews.llvm.org/D146716
Consider the following sturctures when targetting:
struct foo {
int space[4];
char a : 8;
char b : 8;
char x : 8;
char y : 8;
};
struct bar {
int space[4];
char a : 8;
char b : 8;
char : 0;
char x : 8;
char y : 8;
};
Even if both structs have the same layout in memory, they are handled
differenlty by the AMDGPU ABI.
With the following code:
// clang --target=amdgcn-amd-amdhsa -g -O1 example.c -S
char use_foo(struct foo f) { return f.y; }
char use_bar(struct bar b) { return b.y; }
For use_foo, the 'y' field is passed in v4
; v_ashrrev_i32_e32 v0, 24, v4
; s_setpc_b64 s[30:31]
For use_bar, the 'y' field is passed in v5
; v_bfe_i32 v0, v5, 8, 8
; s_setpc_b64 s[30:31]
To make this distinction, we record a single 0-size bitfield for every member that is preceded
by it.
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D144870
Instead, we turn StmtToEnvMap into a concrete class with the implementation that used to live in StmtToEnvMapImpl.
The layering issue that originally required the indirection through the
`StmtToEnvMap` interface no longer exists.
Reviewed By: ymandel, xazax.hun, gribozavr2
Differential Revision: https://reviews.llvm.org/D146507
Some background context: GNU windres invokes the preprocessor in
a subprocess. Some windres options are passed through to the
preproocessor, e.g. -D options for predefining defines.
When GNU windres passes these options onwards, it takes the options
in exact the form they are received (in argv or similar) and
assembles them into a single preprocessor command string which gets
interpreted by a shell (IIRC via the popen() function, or similar).
When LLVM invokes subprocesses, it does so via APIs that take
properly split argument vectors, to avoid needing to worry about
shell quoting/escaping/unescaping. But in the case of LLVM windres,
we have to emulate the effect of the shell parsing done by popen().
Most of the relevant cases are already taken care of here, but this
patch fixes an uncommon case encountered in
https://github.com/llvm/llvm-project/issues/57334.
(This case is uncommon since it doesn't do what one would want to;
the quotes need to be escaped more to work as intended through
the popen() shell).
Differential Revision: https://reviews.llvm.org/D146848
When preprocessing was integrated to llvm-rc in 2021, this was a
new requirement (previously one could execute llvm-rc without a
suitable preprocessing tool to be available).
As a transitional helper, llvm-rc fell back on skipping preprocessing
if no suitable tool was found (with a warning printed), but users
could pass an llvm-rc specific option to silence the warning, if they
explicitly want to run the tool without preprocessing.
Now 2 years later, remove the transitional helper - error out if
preprocessing failed. The option for disabling preprocessing remains.
Differential Revision: https://reviews.llvm.org/D146797
This was the original option name from the first iteration of the patch
that added the feature, but during review, a different name was suggested
and preferred - but the reference in the helpful message was missed.
Differential Revision: https://reviews.llvm.org/D146796
In some cases, there's no adjacent executable named "clang" or
"clang-cl", but one name "clang-<major>". This logic doesn't
cover every possible deployment setup of course, but should
cover more fairly common/reasonable cases.
See
caaae171ac (commitcomment-105808524)
for discussion about a case where this would have been helpful.
Differential Revision: https://reviews.llvm.org/D146794
The arguments passed in this option were passed onto the child
process, but we still blindly used the clang binary that we had
found to sys::ExecuteAndWait as the intended executable to run.
If the user hasn't specified any custom --preprocessor command,
Args[0] is equal to the variable Clang.
This doesn't affect any tests, since the tests only print the
arguments it would try to execute (but not the first parameter to
sys::ExecuteAndWait), but there's no testes for executing it
(and validating that it did execute the right thing).
Differential Revision: https://reviews.llvm.org/D146793
`transform.pack_greedily` supports skipping dimensions in which case we
may well end up with e.g. a matvec innermost.
We should not spuriously crash in such cases.
When doing a trivial unswitch of a switch statement the code need
to "invalidate SCEVs for the outermost loop reached by any of the
exits", as indicated by code comments.
Depending on if we find such an outermost loop or not we can limit
the invalidation to some sub-loops or the full loop-nest. As shown
in the added test case there seem to have been some bugs in the code
that was finding the "outermost loop", so we could end up invalidating
too few loops.
Seems like commit 1bf8ae17f5e2714c8c87978 introduced the bug by
moving the code that invalidates the loops above some of the code
that computed 'OuterL'. This patch fixes that by also moving that
computation of 'OuterL' so that we compute 'OuterL' properly before
we use it for the SCEV invalidation.
Differential Revision: https://reviews.llvm.org/D146963
DAGISel uses CopyToReg/CopyFromReg to lower PHI nodes. With large PHIs, this can result in poor codegen.
This is because it introduces a need to have a build_vector before copying the PHI value, and that build_vector may have many undef elements. This can cause very high register pressure and abnormal stack usage in some cases.
This scalarization/phi "break-up" can be easily tuned/disabled through CL options in case it's not beneficial for some users.
It's also only enabled for DAGIsel and GlobalISel handles PHIs much better (as it works on the whole function).
This can both scalarize (break a vector into its elements) and simplify (break a vector into smaller, more manageable subvectors) PHIs.
Fixes SWDEV-321581
Reviewed By: kzhuravl
Differential Revision: https://reviews.llvm.org/D143731
Generalize `tryFoldLCSSAPhi` into `tryFoldPhiAGPR` which works
on any kind of PHI node (not just LCSSA ones) and attempts to
create AGPR Phis more aggressively.
Also adds a GFX908-only "cleanup" function `tryOptimizeAGPRPhis`
which tries to minimize AGPR to AGPR copies on GFX908, which doesn't
have a ACCVGPR MOV instruction (so AGPR-AGPR copies become 2 or 3 instructions
as they need a VGPR temp). The reason why this is needed is because D143731
+ the new `tryFoldPhiAGPR` may create a lot more PHIs (one 32xfloat PHI becomes
32 float phis), and if each PHI hits the same AGPR (like in `test_mfma_loop_agpr_init`)
they will be lowered to 32 copies from the same AGPR, which will each become 2-3 instructions.
Creating a VGPR cache in this case prevents all those copies from being generated
(we have AGPR-VGPR copies instead which are trivial).
This is a prepation patch intended to prevent regressions in D143731 when
AGPRs are involved.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D144099
Guard Widening is ignorant about blocks frequency. As result, it may
end up widening conditions from cold/effectively dead code into some
much hotter place, harming average performance.
This reverts commit 1387a13e1d0bac94457626ef3e7427c84caf6e65.
This introduced performance regressions on AArch64, when the cost of a
vector GEP + extracts is offset by the benefits of vectorizing the rest
of the tree.
The test in llvm/test/Transforms/SLPVectorizer/AArch64/vector-getelementptr.ll
illustrates the issue. It was extracted from code that regressed a SPEC
benchmark by 15%.
Ignore unevaluated expressions in rvalue-reference-param-not-moved
check since they are not actual uses of a move().
Reviewed By: PiotrZSL
Differential Revision: https://reviews.llvm.org/D146929
This increase the flexibility of the transformation to allow mixed packing / padding specifications.
Differential Revision: https://reviews.llvm.org/D146969
This make sure the docs are always available and can be manually uploaded
if a later step fails.
Reviewed By: thieta
Differential Revision: https://reviews.llvm.org/D145996
These will be replaced by CMake's check_linker_flag once we update
the minimum CMake version 3.20.
Differential Revision: https://reviews.llvm.org/D145716
Fix a crash when a warning is emitted while loading the symbols from the
main binary. The warning helper assumes that the resulting debug map is
initialized, but this happened after loading the main binary. Since
there's no dependency between the two the initialization can be moved
up.
rdar://107298776
We missed certain updates, mostly to call site information, and
dependent AAs did not get recomputed. We also did not properly
distinguish and propagate incoming and outgoing information of call
sites.
The runtime tests passes now, I'll add a proper test for
AAExecutionDomain soon that covers all the cases and ensures we haven't
forgotten more updates. To help unblock some apps, I'll put the fix
first.
This callback caused us to potentially miss out on call edges if we were
expecting a custom state machine since the custom state machine was not
created but the workers also did not enter the generic one. I have not
observed an issue and don't know how to create a test for sure, but it
is saver to err on the conservative side for now.
We can use dominance and avoid the special handling of kernels and
prevent inserting code before allocas accidentally (as happend in the
runtime test).
This code may have served a purpose at some point but it has been dead
for a long while. `FromMapperBase` was always `nullptr` which is `false`
which makes the rest of the code dead. Since this has not
affected tests, I delete it for now.
Like D145900, the patch also supports fixed vector strict_fma nodes in RISC-V by
customized lowering them to riscv_strict_vfmadd_vl nodes. riscv_strict_vfmadd_vl
is created to avoid some riscv_vfmadd_vl optimizations happening to original
strict_fma nodes. The patch also adds combine patterns for riscv_strict_fmadd_vl
nodes with negation operands.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D146939
ExecutorAddr was introduced in b8e5f918166 as an eventual replacement for
JITTargetAddress. ExecutorSymbolDef is introduced in this patch as a
replacement for JITEvaluatedSymbol: ExecutorSymbolDef is an (ExecutorAddr,
JITSymbolFlags) pair, where JITEvaluatedSymbol was a (JITTargetAddress,
JITSymbolFlags) pair.
A number of APIs had already migrated from JITTargetAddress to ExecutorAddr,
but many of ORC's internals were still using the older type. This patch aims
to address that.
Some public APIs are affected as well. If you need to migrate your APIs you can
use the following operations:
* ExecutorAddr::toPtr replaces jitTargetAddressToPointer and
jitTargetAddressToFunction.
* ExecutorAddr::fromPtr replace pointerToJITTargetAddress.
* ExecutorAddr(JITTargetAddress) creates an ExecutorAddr value from a
JITTargetAddress.
* ExecutorAddr::getValue() creates a JITTargetAddress value from an
ExecutorAddr.
JITTargetAddress and JITEvaluatedSymbol will remain in JITSymbol.h for now, but
the aim will be to eventually deprecate and remove these types (probably when
MCJIT and RuntimeDyld are deprecated).
C721 says that a type parameter value of '*' is permitted in the type-spec
for a named constant; C795 says that such type parameters are allowed
in type-specs only for a few kinds of things, not including named
constants. The interpretation seems to depend on context, with C721
applying to intrinsic types (i.e., character) and C795 applying only
to derived types.
Differential Revision: https://reviews.llvm.org/D146586
We don't have very many compressible FP instructions, just load and store.
These instruction require the FP register to be f8-f15.
This patch changes the FP allocation order to prioritize f10-f15 first.
These are also the FP argument registers. So I allocated them in reverse
order starting at f15 to avoid taking the first argument registers.
This appears to match gcc allocation order.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D146488
The semantic checking of DO CONCURRENT bodies looks only at the
parse tree, not the typed expressions produced from it, so it
misses calls to defined assignment subroutines that arise from
assignment statements that resolve via generic interfaces into
subroutine calls. Extend the checking to peek into the typed
assignment operations left on the parse tree by semantics.
Differential Revision: https://reviews.llvm.org/D146585
When building OpenMP as part of LLVM, CMAKE was generating incorrect
location references for OpenMP build's first step's artifacts being used
in regenerating its Windows import library in the second step. The fix is
to feed a dummy non-buildable, rather than buildable, source to CMAKE to
satisfy its source requirements removing the need to reference the first
step's artifacts in the second step altogether.
Differential Revision:https://reviews.llvm.org/D146894
The utility routine in semantics that determines whether an
executable construct constitutes an image control statement
was not examining the single action statement controlled by
a non-construct IF statement, e.g. 'IF(P) STOP'.
Differential Revision: https://reviews.llvm.org/D146584
Added instruction and link to join the llvm discord and discourse group
in the CONTRIBUTING.md files
Reviewed By: keith
Differential Revision: https://reviews.llvm.org/D146877
Presently, semantics doesn't check for discrepancies between known
constant corresponding LEN type parameters between the declared type
of an allocatable/pointer and either the type-spec or the SOURCE=/MOLD=
on an ALLOCATE statement.
This allows discrepancies between character lengths to go unchecked.
Some compilers accept mismatched character lengths on SOURCE=/MOLD=
and the allocate object, and that's useful and unambiguous feature
that already works in f18 via truncation or padding. A portability
warning should issue, however.
But for mismatched character lengths between an allocate object and
an explicit type-spec, and for any mismatch between derived type
LEN type parameters, an error is appropriate.
Differential Revision: https://reviews.llvm.org/D146583
Consolidate aspects of pointer assignment & structure constructor pointer component
checking from Semantics/assignment.cpp and /expression.cpp into /pointer-assignment.cpp,
and add a warning about data targets that are not definable objects
but not hard errors. Specifically, a structure component pointer component data
target is not allowed to be a USE-associated object in a pure context by a numbered
constraint, but the right-hand side data target of a pointer assignment statement
has no such constraint, and that's the new warning.
Differential Revision: https://reviews.llvm.org/D146581
The data statement variable checker is missing some cases, like expressions
that are not variables. Run the checker first to enjoy its very specific
error messages, but when it finds no problems, still apply a general
check that an expression is a "variable" and also not a constant expression
at the top level as a backstop.
Differential Revision: https://reviews.llvm.org/D146580