The variables are all `constexpr`, which implies `inline`. Since they
aren't `constexpr` in C++03 they're also not `inline` there. Because of
that we define them out-of-line currently. Instead we can use the C++17
extension of `inline` variables, which results in the same weak
definitions of the variables but without having all the boilerplate.
This reverts commit 8a1ca6cad9cd0e972c322910cdfbbe9552c6c7ca.
I have fixed 2 things:
* The report is now sent by stdin so we do not hit the limit on the size
of command line arguments.
* The report is limited to 1MB in size and if we exceed that we fall back
to listing only the totals with a note telling you to check the full log.
Related to PR #114423, this PR proposes to unify the naming of the
internal pointer members in `std::vector` and `std::__split_buffer` for
consistency and clarity.
Both `std::vector` and `std::__split_buffer` originally used a
`__compressed_pair<pointer, allocator_type>` member named `__end_cap_`
to store an internal capacity pointer and an allocator. However,
inconsistent naming changes have been made in both classes:
- `std::vector` now uses `__cap_` and `__alloc_` for its internal
pointer and allocator members.
- In contrast, `std::__split_buffer` retains the name `__end_cap_` for
the capacity pointer, along with `__alloc_`.
This inconsistency between the names `__cap_` and `__end_cap_` has
caused confusions (especially to myself when I was working on both
classes). I suggest unifying these names by renaming `__end_cap_` to
`__cap_` in `std::__split_buffer`.
In case the first operand is a physical register with no register class, use
the second operand of the sub as the register class for the new virtual
register in genSubAdd2SubSub machine combine.
When retrieving the location of the function declaration, we were
dropping the file component on the floor, which resulted in an amusingly
confusing situation were we displayed the file containing the
implementation of the function, but used the line number of the
declaration. This patch fixes that.
It required a small refactor Function::GetStartLineSourceLineInfo to
return a SupportFile (instead of just the file spec), which in turn
necessitated changes in a couple of other places as well.
I've been contributing to the Clang Static Analyzer for a while now. I
think from 2019, or something like that.
I've ensured the quality of the Static Analyzer releases for the last
~4-6 releases now, with testing, fixing and backporting patches; also
writing comprehensive release notes for each release.
I have a strong sense of ownership of the code I contribute.
I follow the issue tracker, and also try to follow and participate in
RFCs on Discourse if I'm not overloaded.
I also check Discord time-to-time, but I rarely see anything there.
You can find the maintainer section of the LLVM DeveloperPolicy
[here](https://llvm.org/docs/DeveloperPolicy.html#maintainers) to read
more about the responsibilities.
- Add option (--report-failures-only) to generate a reduced report for
lit tests that only includes failing tests
- This is a continuation of proposed patches by @gregbedwell here:
- https://reviews.llvm.org/D143516
- https://reviews.llvm.org/D143519
---------
Co-authored-by: Greg Bedwell <greg.bedwell@sony.com>
Co-authored-by: James Henderson <James.Henderson@sony.com>
Structural equivalence check uses a cache to store already found
non-equivalent values. This cache can be reused for calls (ASTImporter
does this). Value of "IgnoreTemplateParmDepth" can have an effect on the
structural equivalence therefore it is wrong to reuse the same cache for
checks with different values of 'IgnoreTemplateParmDepth'. The current
change adds the 'IgnoreTemplateParmDepth' to the cache key to fix the
problem.
In LoongArch64, the passing and returning of type `complex16` is similar
to that of structure type like `struct {fp128, fp128}`, meaning they are
passed and returned by reference. This behavior is similar to clang, so
it can implement conveniently `iso_c_binding`.
Additionally, this patch fixes the failure in flang test
Integration/debug-complex-1.f90:
```
llvm-project/flang/lib/Optimizer/codeGen/Target.cpp:56:
not yet implemented: complex for this precision for return type
Since these `UndefValue::get` are acted as placeholders, I think it's
safe to replace them with poison values.
There are a lot of `UndefValue::get` in LLVM, I'll start fixing the ones
in `unittests` while fixing the regression tests.
It's never set to true. Inheritable FDs are also dangerous as they can
end up processes which know nothing about them. It's better to
explicitly pass a specific FD to a specific subprocess, which we already
mostly can do using the ProcessLaunchInfo FileActions.
This allows formatting large integers in a human friendly way. Example:
"5321584" -> "5.32M".
Use it where such human numbers are generated manually today.
This PR re-introduces the functionality of
https://github.com/llvm/llvm-project/pull/113064, which was reverted in
0a68171b3c
due to memory lifetime issues.
Notice that I was not able to re-produce the ASan results myself, so I
have not been able to verify that this PR really fixes the issue.
---
Currently it is unsupported to:
1. Convert a MlirAttribute with type i1 to a numpy array
2. Convert a boolean numpy array to a MlirAttribute
Currently the entire Python application violently crashes with a quite
poor error message https://github.com/pybind/pybind11/issues/3336
The complication handling these conversions, is that MlirAttribute
represent booleans as a bit-packed i1 type, whereas numpy represents
booleans as a byte array with 8 bit used per boolean.
This PR proposes the following approach:
1. When converting a i1 typed MlirAttribute to a numpy array, we can not
directly use the underlying raw data backing the MlirAttribute as a
buffer to Python, as done for other types. Instead, a copy of the data
is generated using numpy's unpackbits function, and the result is send
back to Python.
2. When constructing a MlirAttribute from a numpy array, first the
python data is read as a uint8_t to get it converted to the endianess
used internally in mlir. Then the booleans are bitpacked using numpy's
bitpack function, and the bitpacked array is saved as the MlirAttribute
representation.
Add a new helper function `isReachable` to `Block`. This function
traverses all successors of a block to determine if another block is
reachable from the current block.
This functionality has been reimplemented in multiple places in MLIR.
Possibly additional copies in downstream projects. Therefore, moving it
to a common place.
…er possible
In case of Neon, if there exists extractelement from lane != 0 such that
1. extractelement does not necessitate a move from vector_reg -> GPR
2. extractelement result feeds into fmul
3. Other operand of fmul is a scalar or extractelement from lane 0 or
lane equivalent to 0
then the extractelement can be merged with fmul in the backend and it
incurs no cost.
e.g.
```
define double @foo(<2 x double> %a) {
%1 = extractelement <2 x double> %a, i32 0
%2 = extractelement <2 x double> %a, i32 1
%res = fmul double %1, %2
ret double %res
}
```
`%2` and `%res` can be merged in the backend to generate:
`fmul d0, d0, v0.d[1]`
The change was tested with SPEC FP(C/C++) on Neoverse-v2.
**Compile time impact**: None
**Performance impact**: Observing 1.3-1.7% uplift on lbm benchmark with -flto depending upon the config.
This patch teaches extractCallsFromIR to recognize heap allocation
functions. Specifically, when we encounter a callee that is known to
be a heap allocation function like "new", we set the callee GUID to 0.
Note that I am planning to do the same for the caller-callee pairs
extracted from the profile. That is, when I encounter a frame that
does not have a callee, we assume that the frame is calling some heap
allocation function with GUID 0.
Technically, I'm not recognizing enough functions in this patch.
TCMalloc is known to drop certain frames in the call stack immediately
above new. This patch is meant to lay the groundwork, setting up
GetTLI, plumbing it to extractCallsFromIR, and adjusting the unit
tests. I'll address remaining issues in subsequent patches.
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
Initially landed in 3ed4b0b0efca7a9467ce83fc62de9413da38006d.
Reverted in 375d1925dbd0c051fe2d4a86fe98ed08f4a502c5 because the
[`load-store.ll`](https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/NVPTX/load-store.ll)
test was not updated after 5e75880165553e9afb721239689a9c79ec84a108.
5e75880165553e9afb721239689a9c79ec84a108 is now updated in
7a99f2322c324972f2c5091dddd7752fa21d5a78.
Multiple `func.return` ops inside of a `func.func` op are now supported
during bufferization. This PR extends the code base in 3 places:
- When inferring function return types, `memref.cast` ops are folded
away only if all `func.return` ops have matching buffer types. (E.g., we
don't fold if two `return` ops have operands with different layout
maps.)
- The alias sets of all `func.return` ops are merged. That's because
aliasing is a "may be" property.
- The equivalence sets of all `func.return` ops are taken only if they
match. If different `func.return` ops have different equivalence sets
for their operands, the equivalence information is dropped. That's
because equivalence is a "must be" property.
This commit is in preparation of removing the deprecated
`func-bufferize` pass. That pass can bufferize functions with multiple
`return` ops.
The biggest change is assigning vector crypto instructions to the
correct processor resource.
The majority of these changes are guided by our RVV-capable
llvm-exegesis.