my prove:
we can simple `(n * s) ceildiv a ceildiv s` to `n ceildiv a`
because `(n * s) ceildiv a ceildiv b` <=> `(n * s) ceildiv s ceildiv a`
<=> `n ceildiv a`
let's prove the `s floordiv a floor b` <=> `s floordiv b floor a`
let `s = ka +m (m < a)` so `s floordiv a` <=> `s / a - m / a`
similarly, it can be proven that:
`s floordiv a floordiv b` <=> `s / (a * b) - m / (a * b) - n / (b) constrain (n < b)`
<=> `s / (a * b) - (m + a*n) / (a*b)`
because `a* b - (m + a*n)` <=> `a*b - a*n - m` > `a - m` > `0`
so `s floordiv a floordiv b` <=> `[s / (a*b)]` <=> `s floordiv b floordiv a`
but if `s floordiv b` mutiply a factor above didn't always hold true.
Fixes https://github.com/llvm/llvm-project/issues/107508
Summary:
We currently use `llvm-mc` which is intended for internal testing and
not expected to be present in every installation. This patch changes
that to just use clang instead to get the `.o` from the HIP registration
code.
My preferred solution would be to use the new driver, but I still
haven't gotten the test suite to pass on this one weird OpenMP case.
Fixes: https://github.com/llvm/llvm-project/issues/112031
When `andn` is available, we should avoid switching `s &= ~(z & ~y);` into `s &= ~z | y;`
This patch turns this assembly from:
```
foo:
not rcx
and rsi, rdx
andn rax, rsi, rdi
or rcx, rdx
and rax, rcx
ret
```
into:
```
foo:
and rsi, rdx
andn rcx, rdx, rcx
andn rax, rsi, rdi
andn rax, rcx, rax
ret
```
Fixes#108731
A call to a function that has this attribute is not a source of
divergence, as used by UniformityAnalysis. That allows a front-end to
use known-name calls as an instruction extension mechanism (e.g.
https://github.com/GPUOpen-Drivers/llvm-dialects ) without such a call
being a source of divergence.
Currently, the library-internal feature test macros are only defined if
the feature is not available, and always have the prefix
`_LIBCPP_HAS_NO_`. This patch changes that, so that they are always
defined and have the prefix `_LIBCPP_HAS_` instead. This changes the
canonical use of these macros to `#if _LIBCPP_HAS_FEATURE`, which means
that using an undefined macro (e.g. due to a missing include) is
diagnosed now. While this is rather unlikely currently, a similar change
in `<__configuration/availability.h>` caught a few bugs. This also
improves readability, since it removes the double-negation of `#ifndef
_LIBCPP_HAS_NO_FEATURE`.
The current patch only touches the macros defined in `<__config>`. If
people are happy with this approach, I'll make a follow-up PR to also
change the macros defined in `<__config_site>`.
For a large binary with BAT section of size 38 MB with ~170k maps,
reduces writeMaps time from 70s down to 1s.
The inefficiency was in the use of std::distance with std::map::iterator
which doesn't provide random access. Use sorted vector for lookups.
Test Plan: NFC
Reviewers: maksfb, rafaelauler, dcci, ayermolo
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/112061
The `PGOInstrumentationGen` pass should preserve all analysis results
when nothing was actually instrumented. Currently, only modules that
contain at least a single function definition are instrumented. When a
module contains only function declarations and, optionally, global
variable definitions (a module for the regular-LTO phase for thin-LTO
when LTOUnit splitting is enabled, for example), such module is not
instrumented (yet?) and there is no reason to invalidate any analysis
results.
NFC.
Attempt to fit sporadic precommit test failures in
hip-partial-link.hip
The driver really shouldn't be using llvm-mc in the first place
though, filed #112031 to fix this.
* Convert `ReflowComments` from boolean into a new `enum` which can take
on the value `RCS_Never`, `RCS_IndentOnly`, or `RCS_Always`. The first
one is equivalent to the old `false`, the third one is `true`, and the
middle one means that multiline comments should only have their
indentation corrected, which is what Doxygen users will want.
* Preserve backward compatibility while parsing `ReflowComments`.
Add support for the following two patterns:
```
haystack.compare(haystack.length() - needle.length(), needle.length(), needle) == 0;
haystack.rfind(needle) == (haystack.size() - needle.size());
```
There are hard to debug leaks which look like
false.
In general, repeating leak checking should not
affect set of leaks significantly, especial
`at_exit` leak checking.
But if we see significant discrepancy, it may give
us a clue for investigation.
Needed for #110894
The strtointeger code was updated to support some bigint types in #85201
but not all. This patch finishes the cleanup so that it can work for a
wider range of types.
`parent_id` and `stack_id` represent location
where the thread was created, so it's reasonable
to keep them togeter.
For now, only Asan and MemProf use `stack_id`,
but it will be halpfull to print thread origin from
other sanitizers as well.
For #111948
ReplaceWithVeclib.cpp would construct overload name using all the
arguments in the intrinsic, but overloads should only be constructed
from arguments for which isVectorIntrinsicWithOverloadTypeAtArg returns
true, including the return type first (index -1).
Additionally,
- skip when `Intrinsic::not_intrinsic`, otherwise
`isVectorIntrinsicWithOverloadTypeAtArg` asserts for some
IntrinsicCalls.
Unblocks translation for pow and atan2 intrinsics.
Fixes#111093
This patch enables support for cloning in indirect callsites.
This is done by synthesizing callsite records for each virtual call
target from the profile metadata. In the thin link all the synthesized
records for a particular indirect callsite initially share the same
context node, but support is added to partition the callsites and
outgoing edges based on the callee function, creating a separate node
for each target.
In the LTO backend, when cloning is needed we first perform indirect
call promotion, then change the target of the new direct call to the
desired clone.
Note this is ThinLTO-specific, since for regular LTO indirect call
promotion should have already occurred.
a non-functional change
Update test script with update_mc_test_check script and sort the
testline to be alphabetic order. This helps to maintain the test file in
a clean state
When encountering an instruction like `if (p0) r0 = add(r0,##bar@GOT)`,
lld would fail with:
```
ld.lld: error: unrecognized instruction for 16_X type: 0x7400C000
```
This issue was encountered while building libreadline with clang 19.1.0.
Fixes: #111876
The API was already using top()/bottom() but internally we were still
using From/To. This patch fixes this.
Top/Bottom seems a better choice because implies program order, whereas
From/To does not.