Tracks the registers that explicit and hidden arguments are preloaded to
with new code object metadata.
IR arguments may be split across multiple parts by isel, and SGPR tuple
alignment means that an argument may be spread across multiple
registers.
To support this, some of the utilities for hidden kernel arguments are
moved to `AMDGPUArgumentUsageInfo.h`. Additional bookkeeping is also
needed for tracking purposes.
This is crucial when recovering from fatal loader errors. Without it,
the `Lexer` keeps yielding more tokens and the compiler may access
invalid `ASTReader` state.
rdar://133388373
Currently, these breakpoints are being accumulated every time a new
process if created (e.g. through a `run`). Depending on the
circumstances, the old breakpoints are even left enabled, interfering
with subsequent processes. This is addressed by removing the breakpoints
in ProcessGDBRemote::Clear
Note that these breakpoints are more of a PlatformDarwin thing, so in
the future we should look into moving them there.
No longer require -fopenmp or -fopenacc with -E, unless specific version
number options are also required for predefined macros. This means that
most source can be preprocessed with -E and then later compiled with
-fopenmp, -fopenacc, or neither.
This means that OpenMP conditional compilation lines (!$) are also
passed through to -E output. The tricky part of this patch was dealing
with the fact that those conditional lines can also contain regular
Fortran line continuation, and that now has to be deferred when !$ lines
are interspersed.
This patch introduces a new option `-preserve-merged-debug-info` to
preserve an arbitrary but deterministic version of debug information
when DILocations are merged. This is intended to be used in production
environments from which sample based profiles are derived such as
AutoFDO and MemProf.
With this patch we have see a 0.2% improvement on an internal workload
at Google when generating AutoFDO profiles. It also significantly
improves the ability for MemProf by preserving debug info for merged
call instructions used in the contextual profile.
---------
Co-authored-by: Krzysztof Pszeniczny <kpszeniczny@google.com>
Compute the result types and bail out before modifying any IR. That is
more efficient when type conversion failed, because no modifications
must be rolled back.
Note: This is in preparation of the One-Shot Dialect Conversion
refactoring.
When a pending relocation is created it is also marked whether it is
optional or not. It can be optional when such relocation is added as
part of an optimization (i.e., `scanExternalRefs`).
When bolt tries to `flushPendingRelocations`, it safely skips any
optional relocations that cannot be encoded due to being out of
range. A pre-requisite to that is the usage of the `-force-patch`
flag. Alternatrively, BOLT will bail out with a relevant message.
Background:
BOLT, as part of scanExternalRefs, identifies external references from
calls and creates some pending relocations for them. Those when
flushed will update references to point to the optimized functions.
This optimization can be disabled using `--no-scan`.
BOLT can assert if any of these pending relocations cannot be encoded.
This patch does not disable this optimization but instead selectively
applies it given that a pending relocation is optional and `-force-patch`
was enabled.
After https://github.com/llvm/llvm-project/pull/133220 we had some empty
complex literals (`tensor<0xcomplex<f32>>`) failing to parse.
This was largely due to the ambiguity between `shape.empty()` meaning
splat (`dense<1>`) or empty literal (`dense<>`). Used type's numel to
disambiguate during verification.
The try-compile mechanism requires that `CMAKE_REQUIRED_FLAGS` is a
space-separated string instead of a list of flags. The original code
expanded `BUILTIN_FLAGS` into `CMAKE_REQUIRED_FLAGS` as a
space-separated string and then would overwrite `CMAKE_REQUIRED_FLAGS`
with `TARGET_${arch}_CFLAGS` prepended to the unexpanded
`BUILTIN_CFLAGS_${arch}`. This resulted in the first two arguments being
passed into the try-compile invocation, but dropping the other arguments
listed in `BUILTIN_CFLAGS_${arch}`.
This patch appends `TARGET_${arch}_CFLAGS` and `BUILTIN_CFLAGS_${arch}` to
`CMAKE_REQUIRED_FLAGS` before expanding CMAKE_REQUIRED_FLAGS as a
space-separated string. This passes any pre-set required flags, in addition to
all of the builtin and target flags to the Float16 detection.
This way we emit the error message that explains the full syntax
for a register list.
parseZcmpStackAdj had to be modified to not assume the previous
operand had been successfully parsed as a register list.
There were some remaining headers that were not guarded with
_LIBCPP_HAS_LOCALIZATION, leading to errors when trying to use modules
on platforms that don't support localization (since all the headers get
pulled in when building the 'std' module). This patch brings these
headers in line with what we do for every other header that depends on
localization.
This patch also requires including <picolibc.h> from
<__configuration/platform.h> in order to define _NEWLIB_VERSION. In the
long term, we should use a better approach for doing that, such as
defining a macro in the __config_site header.
The preprocessor can perform macro replacement within identifiers when
they are split up with Fortran line continuation, but is failing to do
macro replacement on a continued identifier when none of its parts are
replaced.
The optional second argument to IEEE_SUPPORT_FLAG (and related functions
from the intrinsic IEEE_ARITHMETIC module) is needed only for its type,
not its value. Restrictions on local objects as arguments to function
references in specification expressions shouldn't apply to it.
Define a new attribute for dummy data object characteristics to
distinguish such arguments, set it for the appropriate intrinsic
function references, and test it during specification expression
validation.
The RUNTIME_CHECK in question doesn't allow for the possibility that an
allocatable or pointer component could be processed by defined I/O.
Remove it in favor of a dynamic allocation check.
Fortran::runtime::Descriptor::BytesFor() only works for Fortran
intrinsic types for which a C++ type counterpart exists, so it crashes
on some types that are legitimate Fortran types like REAL(2). Move some
logic from Evaluate into a new header in flang/Common, then use it to
avoid this needless dependence on C++.
A function or subroutine can allow an object of the same name to appear
in its scope, so long as the name is not used. This is similar to the
case of a name being imported from multiple distinct modules, and
implemented by the same representation.
It's not clear whether this is conforming behavior or a common
extension.
Add function and subroutine forms of FSEEK and FTELL as intrinsic
procedures. Accept common aliases from legacy compilers as well.
A separate patch to llvm-test-suite will enable tests for these
procedures once this patch has merged.
Depends on https://github.com/llvm/llvm-project/pull/132423; CI builds
will likely fail until that patch is merged and this PR is rebased.
This is a mostly straightforward replacement of the previous
`std::pair<int, std::set<std::pair<...>>>` data structure used in
`SLPVectorizerPass::vectorizeStores()` with slightly more readable
alternatives.
I had done that change in my local tree to help me better understand the
code. It’s not very invasive, so I thought I’d create a PR for it.
Add an initial CFG simplification transform, which removes the dead
edges for blocks terminated with BranchOnCond true.
At the moment, this removes the edge between middle block and scalar
preheader when folding the tail.
PR: https://github.com/llvm/llvm-project/pull/106748
As far as I understand, the first operand of dbg_declare should be a
pointer (inside a metadata wrapper). However, using a non-pointer is
currently not rejected, and we have some tests that use non-pointer
types. As far as I can tell, these tests either meant to use dbg_value
or are just incorrect hand-crafted tests.
Ran into this while trying to `fix` #134008.
Introduce members() iterator-helper to shorten the members_{begin,end}
idiom. A previous attempt of this patch was #130319, which had to be
reverted due to unit-test failures when attempting to call members() on
the end iterator. In this patch, members() accepts either an ECValue or
an ElemTy, which is more intuitive and doesn't suffer from the same
issue.
Follow up to #134170.
We should be using the LLVM intrinsics instead of plain fir.calls when
we can. Existing code creates a declaration for the llvm intrinsic and
a regular fir.call, which makes it hard for consumers of the IR to find
all the intrinsic calls.
For vector deleting dtors support we now also search and save operator
delete[]. Avoid diagnosing deleted operator delete[] when doing that
because vector deleting dtors are only called when delete[] is present
and whenever delete[] is present in the TU it will be diagnosed
correctly.
Fixes https://github.com/llvm/llvm-project/issues/134265
This changes exposes a low-level helper that is used to implement
`forEachArgumentWithParamType` but can also be used without matchers,
e.g. if performance is a concern.
Commit f5ee10538b68835112323c241ca7db67ca78bf62 introduced a copy of the
implementation of the `forEachArgumentWithParamType` matcher that was
needed for optimizing performance of `-Wunsafe-buffer-usage`.
This change shares the code between the two so that we do not repeat
ourselves and any bugfixes or changes will be picked up by both
implementations in the future.
Fix#134356.
We accidentally skipped checking derived-to-base conversions because
deduction did not strip sugar in the relevant code. This caused
deduction failures when a parameter type had an attribute.
Improved "options" sections of various checks:
1. Added Options keyword to be a delimiter between "body" and "options"
parts of docs
2. Added default values where were absent.
3. Changed double-tick to single-tick in default values.
---------
Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com>
Details: detailed_command_telemetry (bool) and command_id (int) could
already be freed when the dispatcher's dtor runs. So we should just copy
them into the lambda since they are cheap.
- Moved existing llvm/test/CodeGen/X86/powi.ll file to
llvm/test/CodeGen/X86/powi-const.ll.
- Added new testcases for powi into llvm/test/CodeGen/X86/powi.ll.
The original code is essentially performing isel during legalisation
with the AArch64 specific nodes offering no additional value compared to
ISD::SETCC.
This patch updates flang to follow clang's behavior when processing the
`-mcode-object-version` option.
It is now used to populate an LLVM module flag called
`amdhsa_code_object_version` expected by the backend and also updates
the driver to add the `--amdhsa-code-object-version` option to the
frontend invocation for device compilation of AMDGPU targets.
Recently, we have added a set of complex intrinsics on
the TMA, tcgen05, and Cvt family of instructions.
This patch captures the key learnings from our experience
so far and documents them as guidelines for future design.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This PR will fix a bug in a canonicalization pattern (operation
shape.shape_of: shape of reshape)
```
// Before
func.func @f(%arg0: tensor<?x1xf32>, %arg1: tensor<3xi32>) -> tensor<3xindex> {
%reshape = tensor.reshape %arg0(%arg1) : (tensor<?x1xf32>, tensor<3xi32>) -> tensor<?x1x1xf32>
%0 = shape.shape_of %reshape : tensor<?x1x1xf32> -> tensor<3xindex>
return %0 : tensor<3xindex>
}
//This is will error out as follows:
error: 'tensor.cast' op operand type 'tensor<3xi32>' and result type 'tensor<3xindex>' are cast incompatible
%0 = shape.shape_of %reshape : tensor<?x1x1xf32> -> tensor<3xindex>
^
note: see current operation: %0 = "tensor.cast"(%arg1) : (tensor<3xi32>) -> tensor<3xindex>
```
```
// After
func.func @f(%arg0: tensor<?x1xf32>, %arg1: tensor<3xi32>) -> tensor<3xindex> {
%0 = arith.index_cast %arg1 : tensor<3xi32> to tensor<3xindex>
return %0 : tensor<3xindex>
}
```
See file canonicalize.mlir in the change list for an example.
For the context, this bug was found while running a test on Keras 3, the
canonicalizer errors out due to an invalid tensor.cast operation when
the batch size is dynamic.
The operands of the op are tensor<3xi32> cast to tensor<3xindex>.
This change is related to a previous PR:
https://github.com/llvm/llvm-project/pull/98531
---------
Co-authored-by: Alaa Ali <alaaali@ah-alaaali-l.dhcp.mathworks.com>
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>