Fixes#28667
There's a bunch of ways to end up building split DWARF where the
DWO file is not next to the program file. On top of that you may
distribute the program in various ways, move files about, switch
machines, flatten the directories, etc.
This change adds a few more strategies to find DWO files:
* Appending the DW_AT_COMP_DIR and DWO name to all the debug
search paths.
* Appending the same to the binary's dir.
* Appending the DWO name (e.g. a/b/foo.dwo) to all the debug
search paths.
* Appending the DWO name to the binary's location.
* Appending the DWO filename (e.g. foo.dwo) to the debug
search paths.
* Appending the DWO filename to the binary's location.
They are applied in that order and some will be skipped
if the DW_AT_COMP_DIR is relative or absolute, same for
the DWO name (though that seems to always be relative).
This uses the setting target.debug-file-search-paths, which
is used for DWP files already.
The added tests likely do not cover every part of the
strategies listed, it's a best effort.
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D157609
This change allows to simplify SparcAsmParser a bit by delegating some
work (parsing singleton registers) to the code generated by llvm-tblgen.
Other than that, there is no functionality change, because registers are
matched using custom code in SparcAsmParser.cpp and always printed in
lowercase by SparcInstPrinter.
The new Buffer Deallocation pass introduced in D158421 will not need the
AllocationOpInterface anymore, thus it is better to move those default
implementations to a place where they will still be used.
The substitution supported by `extraClassOf` is currently limited to
only the base instance, i.e. `Operation*`, `Type` or `Attribute`, which
limits the kind of checks you can perform in the `classof`
implementation.
Since prior to the user code, the interface concept is fetched, we can
use it to construct an instance of the interface, allowing use of its
methods in the `classof` check.
Since an instance of the interface allows access to the base class
methods through the `->` operator, I've gone ahead and replaced the
substitution of `$_op/$_type/$_attr` with an interface instance. This is
also consistent with `extraSharedClassDeclaration` and other methods
created in the interface class which do the same.
Running `dump_format_help.py` in `clang/docs/tools`:
```
warning: line too long:
relative to the current working directory when reading stdin.
warning: line too long:
--files=<filename> - A file containing a list of files to process, one per line.
warning: line too long:
--help-list - Display list of available options (--help-list-hidden for more)
Traceback (most recent call last):
File "/Users/Owen/remove-braces/clang/docs/tools/./dump_format_help.py", line 63, in <module>
contents = substitute(contents, "FORMAT_HELP", help_text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Owen/remove-braces/clang/docs/tools/./dump_format_help.py", line 17, in substitute
return re.sub(pattern, "%s", text, flags=re.S) % replacement
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
TypeError: not enough arguments for format string
```
Usually RecordValues for record objects (e.g. struct) are initialized with
`Environment::createValue()` which internally calls `getObjectFields()` to
collects all fields from the current and base classes, and then filter them
with `ModeledValues` via `DACtx::getModeledFields()` so that the fields that
are actually referenced are modeled.
The consistent set of fields should be initialized when a record is initialized
with an initializer list (InitListExpr), however the existing code's behavior
was different.
Before this patch:
* When a struct is initialized with InitListExpr, its fields are
initialized based on what is returned by `getFieldsForInitListExpr()`, which
only collects the direct fields in the current class, but not from the base
classes. Moreover, if the base classes have their own InitListExpr, values
that are initialized by their InitListExpr's weren't merged into the
child objects.
After this patch:
* When a struct is initialized with InitListExpr, it collects and merges the
fields in the base classes that were initialized by their InitListExpr's.
The code also asserts that the consistent set of fields are initialized
with the ModeledFields.
Reviewed By: mboehme
Differential Revision: https://reviews.llvm.org/D159284
* replaces `add_rvalue_reference_t` with `is_rvalue_reference_t`
* includes `"stddef.h"` for `size_t` include
---------
Co-authored-by: Guillaume Chatelet <gchatelet@google.com>
On sm_90 some instructions now support i16x2 which allows hardware to
execute more efficiently add, min and max instructions.
In order to support that we need to make i16x2 a native type in the
backend. This does the necessary changes to make i16x2 a native type and
adds support for the instructions natively supporting i16x2.
This caused a negative test in nvptx slp to start passing. Changed the
test to a positive one as the IR is correctly vectorized.
Fixes a corner case of the analysis: previously GlobalValues could be
affected by assumptions, but were not tracked within AffectedValues.
This patch allows assumptions which affect a given GlobalValue to be
looked up via `assumptionsFor()`.
A small update to llvm/test/Analysis/ScalarEvolution/ranges.ll was
necessary due to knowledge about a global value now being propagated
from AssumptionCache -> ValueTracking -> ScalarEvolution.
This happened because we had a section
```
- Index: 3
Kind: DATA
Name: weak_import_data
Flags: [ BINDING_WEAK, UNDEFINED ]
```
Which does not have size. We managed to reproduce it by building llvm
under msan with libcxx as a standard library and debug mode with
-D_LIBCPP_DEBUG_STRICT_WEAK_ORDERING_CHECK. It called comp(a, a) and
full tie detected uninitialized memory
This started to happen after https://reviews.llvm.org/D158799
We've been displaying types and addresses for containers, but that's not
very useful information. A better approach is to compose the summary of
containers with the summary of a few of its children.
Not only that, we can dereference simple pointers and references to get
the summary of the pointer variable, which is also better than just
showing an anddress.
And in the rare case where the user wants to inspect the raw address,
they can always use the debug console for that.
For the record, this is very similar to what the CodeLLDB extension
does, and it seems to give a better experience.
An example of the new output:
<img width="494" alt="Screenshot 2023-09-06 at 2 24 27 PM"
src="https://github.com/llvm/llvm-project/assets/1613874/588659b8-421a-4865-8d67-ce4b6182c4f9">
And this is the
<img width="476" alt="Screenshot 2023-09-06 at 2 46 30 PM"
src="https://github.com/llvm/llvm-project/assets/1613874/5768a52e-a773-449d-9aab-1b2fb2a98035">
old output:
On x86 targets, -fsplit-machine-functions enables splitting of machine
functions using profile information. This patch rolls out the flag for
AArch64 targets.
Depends on D158647
Differential Revision: https://reviews.llvm.org/D157157
The OpenACC standard specifies an `atomic` construct in section 2.12 (of
3.3 spec), used to ensure that a specific location is accessed or
updated atomically. Four different clauses are allowed: `read`, `write`,
`update`, or `capture`. If no clause appears, it is as if `update` is
used.
The OpenMP specification defines the same clauses for `omp atomic`. The
types of expression and the clauses in the OpenACC spec match the OpenMP
spec exactly. The main difference is that the OpenMP specification is a
superset - it includes clauses for `hint` and `memory order`. It also
allows conditional expression statements. But otherwise, the expression
definition matches.
Thus, for OpenACC, we refactor and reuse the OpenMP implementation as
follows:
* The atomic operations are duplicated in OpenACC dialect. This is
preferable so that each language's semantics are precisely represented
even if specs have divergence.
* However, since semantics overlap, a common interface between the
atomic operations is being added. The semantics for the interfaces are
not generic enough to be used outside of OpenACC and OpenMP, and thus
new folders were added to hold common pieces of the two dialects.
* The atomic interfaces define common accessors (such as getting `x` or
`v`) which match the OpenMP and OpenACC specs. It also adds common
verifiers intended to be called by each dialect's operation verifier.
* The OpenMP write operation was updated to use `x` and `expr` to be
consistent with its other operations (that use naming based on spec).
The frontend lowering necessary to generate the dialect can also be
reused. This will be done in a follow up change.
NFC changes to explicitly specify the type we are matching when creating
Int32 reg. This will allow use to have multiple types mapping those
register without causing ambigous matching.
On AArch64, it is safe to let the linker handle relaxation of
unconditional branches; in most cases, the destination is within range,
and the linker doesn't need to do anything. If the linker does insert
fixup code, it clobbers the x16 inter-procedural register, so x16 must
be available across the branch before linking. If x16 isn't available,
but some other register is, we can relax the branch either by spilling
x16 OR using the free register for a manually-inserted indirect branch.
This patch builds on D145211. While that patch is for correctness, this
one is for performance of the common case. As noted in
https://reviews.llvm.org/D145211#4537173, we can trust the linker to
relax cross-section unconditional branches across which x16 is
available.
Programs that use machine function splitting care most about the
performance of hot code at the expense of the performance of cold code,
so we prioritize minimizing hot code size.
Here's a breakdown of the cases:
Hot -> Cold [x16 is free across the branch]
Do nothing; let the linker relax the branch.
Cold -> Hot [x16 is free across the branch]
Do nothing; let the linker relax the branch.
Hot -> Cold [x16 used across the branch, but there is a free register]
Spill x16; let the linker relax the branch.
Spilling requires fewer instructions than manually inserting an
indirect branch.
Cold -> Hot [x16 used across the branch, but there is a free register]
Manually insert an indirect branch.
Spilling would require adding a restore block in the hot section.
Hot -> Cold [No free regs]
Spill x16; let the linker relax the branch.
Cold -> Hot [No free regs]
Spill x16 and put the restore block at the end of the hot function; let the linker relax the branch.
Ex:
[Hot section]
func.hot:
... hot code...
func.restore:
... restore x16 ...
B func.hot
[Cold section]
func.cold:
... spill x16 ...
B func.restore
Putting the restore block at the end of the function instead of
just before the destination increases the cost of executing the
store, but it avoids putting cold code in the middle of hot code.
Since the restore is very rarely taken, this is a worthwhile
tradeoff.
Differential Revision: https://reviews.llvm.org/D156767
Instead of hard-coding the name `lldbDataFormatters`, use `__name__` to
get the module's name.
This allows the formatters to be loaded from any path, with any
filename.
Based on https://en.cppreference.com/w/c/memory/aligned_alloc, the
`size` is supposed
to be a multiple of `alignment`, and it is implementation defined
behavior if not.
We have a non-conformant use in `kmp_barrier.h` when allocating
distribute barrier.
The size of the barrier is 576 and the alignment is `4*CACHE_LINE`,
which is 256
on most systems. Apparently it works perfectly fine for Linux and
Intel-based Mac,
but not for Apple Silicon based Mac.
Fix#63194.
Like concepts checking, a trailing return type of a lambda
in a dependent context may refer to captures in which case
they may need to be rebuilt, so the map of local decl
should include captures.
This patch reveal a pre-existing issue.
`this` is always recomputed by TreeTransform.
`*this` (like all captures) only become `const`
after the parameter list.
However, if try to recompute the value of `this` (in a parameter)
during template instantiation while determining the type of the call operator,
we will determine it to be const (unless the lambda is mutable).
There is no good way to know at that point that we are in a parameter
or not, the easiest/best solution is to transform the type of this.
Note that doing so break a handful of HLSL tests.
So this is a prototype at this point.
Fixes#65067Fixes#63675
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D159126
Summary:
A previous introduced a new object type for the GPU functions
implemented by an external vendor library. This was done so they we did
not attempt to run tests on functions which we did not implement,
however this accidentally stopped them from being included in the actual
output. Fix this by checking the new type as well.
The long term goal is to remove this vendor handling altogether, but is
being used as a short-term solution to provide a math library on the
GPU which currently lacks one.
Added a class to hold such common state. The goal is to both reduce the argument list of other utilities used by `computeImportForModule` (which will be brought as members in a subsequent patch), and to make it easy to extend such state later.