These tests have always been flaky, which led us to using ALLOW_RETRIES
on them. However, while investigating #89083 (using Github provided
macOS builders), these tests surfaced as being basically unworkably
flaky in that environment.
This patch solves that problem by refactoring the tests to make them
succeed deterministically.
There were various places where we incorrectly handled exceptions in the
PSTL. Typical issues were missing `noexcept` and taking iterators by
value instead of by reference.
This patch fixes those inconsistent and incorrect instances, and adds
proper tests for all of those. Note that the previous tests were often
incorrectly turned into no-ops by the compiler due to copy ellision,
which doesn't happen with these new tests.
Libc++ has no separate C++98 support, it uses C++03 instead. This
removes some obsolete c++98 markers in the test.
Thanks to @StephanTLavavej for spotting this.
Since C++14 has been released for about nine years and most standard
libraries have implemented sized deallocation functions, it's time to
make this feature default again.
This is another try of https://reviews.llvm.org/D112921.
The original commit cf5a8b4 was reverted by 2e5035a due to some
failures (see #83774).
Fixes#60061
This PR implements [LWG2381](https://cplusplus.github.io/LWG/issue2381)
by rejecting `'i'`, `'I'`, `'n'`, `'N'` in FP parsing, as inf and NaN
are intendedly rejected by that LWG issue.
The source character array used for parsing is
`"0123456789abcdefABCDEFxX+-pPiInN"`, whose first 26 or 28 characters
are used for parsing integers or floating-point values respectively.
Previously, libc++ used 32 characters, including `'i'`, `'I'`, `'n'`,
`'N'`, for FP parsing, which was inconsistent with LWG2381. This PR also
replaces magic numbers 26 and 28 (formerly 32) with named constants.
Drive-by change: when the first character (possibly after the leading
`'+'` or `'-'`) is not a decimal digit but an acceptable character
(e.g., `'p'` or `'e'`), the character is not accumulated now (per Stage
2 in [facet.num.get.virtuals]/3).
#65168 may be rendered invalid, see
https://github.com/llvm/llvm-project/pull/65168#issuecomment-1868533342.
Apple back-deployment targets remain broken, likely due to dylib. XFAIL
is marked in related tests.
---------
Co-authored-by: Mark de Wever <koraq@xs4all.nl>
`remove_cv_t` and `remove_all_extents_t` are taken care of by the
built-in trait, so we don't need to use them directly.
---------
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
## Abstract
This pull request implements LWG3672: `common_iterator::operator->()`
should return by value. The current implementation specifies that this
function should return the underlying pointer by reference (`T*
const&`), but it would be more intuitive to return it by value (`T*`).
## Reference
- [Draft C++ Standard:
[common.iter.access]](https://eel.is/c++draft/common.iter.access)
- [LWG3672](https://cplusplus.github.io/LWG/issue3672)
The `grouping` string for locale `en_US.UTF-8` and `fr_FR.UTF-8` on AIX
is `3`. This is different from Linux's `3;3` but is the same as Windows.
This patch removes `XFAIL: LIBCXX-AIX-FIXME` and changes to use the
`WIN32` code path.
The C locale on AIX uses `ISO-8859-1`, where `0xFB` is a valid
character. Widening char(-5) succeeds and produces L'\u00fb' the same as
on macOS, FreeBSD, and Windows. This patch removes `XFAIL:
LIBCXX-AIX-FIXME` and uses the macOS, FreeBSD, and WIN32 code path for
AIX.
The patch does several things:
- fixes module exports
- disables clang-tidy with Clang-17 due to known issues
- disabled clang-tidy on older libstdc++ versions since it lacks C++20
features used
- fixes the CMake dependency
The issue why clang-tidy was not used in the CI was the last issue; the
plugin was not a
dependency of the tests. Without a plugin the tests disable clang-tidy.
This was noticed while investigating
https://github.com/llvm/llvm-project/issues/89898
This pull request is the third iteration aiming to integrate short
string annotations. This commit includes:
- Enabling basic_string annotations for short strings.
- Setting a value of `__trivially_relocatable` in `std::basic_string` to
`false_type` when compiling with ASan (nothing changes when compiling
without ASan). Short string annotations make `std::basic_string` to not
be trivially relocatable, because memory has to be unpoisoned.
- Adding a `_LIBCPP_STRING_INTERNAL_MEMORY_ACCESS` modifier to two
functions.
- Creating a macro `_LIBCPP_ASAN_VOLATILE_WRAPPER` to prevent
problematic stack optimizations (the macro modifies code behavior only
when compiling with ASan).
Previously we had issues with compiler optimization, which we understand
thanks to @vitalybuka. This commit also addresses smaller changes in
short string, since previous upstream attempts.
Problematic optimization was loading two values in code similar to:
```
__is_long() ? __get_long_size() : __get_short_size();
```
We aim to resolve it with the volatile wrapper.
This commit is built on top of two previous attempts which descriptions
are below.
Additionally, in the meantime, annotations were updated (but it
shouldn't have any impact on anything):
- https://github.com/llvm/llvm-project/pull/79292
---
Previous PR: https://github.com/llvm/llvm-project/pull/79049
Reverted:
a16f81f5e3
Previous description:
Originally merged here: https://github.com/llvm/llvm-project/pull/75882
Reverted here: https://github.com/llvm/llvm-project/pull/78627
Reverted due to failing buildbots. The problem was not caused by the
annotations code, but by code in the `UniqueFunctionBase` class and in
the `JSON.h` file. That code caused the program to write to memory that
was already being used by string objects, which resulted in an ASan
error.
Fixes are implemented in:
- https://github.com/llvm/llvm-project/pull/79065
- https://github.com/llvm/llvm-project/pull/79066
Problematic code from `UniqueFunctionBase` for example:
```cpp
// In debug builds, we also scribble across the rest of the storage.
memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
```
---
Original description:
This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).
Originally suggested here: https://reviews.llvm.org/D147680
String annotations added here:
https://github.com/llvm/llvm-project/pull/72677
Requires to pass CI without fails:
- https://github.com/llvm/llvm-project/pull/75845
- https://github.com/llvm/llvm-project/pull/75858
Annotating `std::basic_string` with default allocator is implemented in
https://github.com/llvm/llvm-project/pull/72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.
Support in ASan API exists since
dd1b7b797a.
You can turn off annotations for a specific allocator based on changes
from
2fa1bec7a2.
This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.
The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.
If you have any questions, please email:
- advenam.tacet@trailofbits.com
- disconnect3d@trailofbits.com
Lit already has support for stopping LIT from parsing further test
directives. It is
// END.
After that directive, LIT will stop parsing.
This change removes the BLOCKLIT hack and replaces it with END.
Adjust some of the [rand.dist] critical values that are too strict
- Most critical values are determined empirically by running each test
51
times with a different PRNG seed and finding the smallest symmetric
interval
around the median that contains 90% of the sample means, variances, etc.
- For the Kolmogorov-Smirnov tests, the alpha=0.1 critical value for
large N
is 1.224/sqrt(N).
- For normally distributed variates, the sample kurtosis is distributed
as
Normal(0, 24/N). For N=1e5, this gives a 90% confidence interval of
0+/-0.0255. For Binomial(40, 0.25), which is approximately normal, the
kurtosis is -0.0167, so the relative 90% CI is large, on the order of
0.0255/0.0167 = 153%. In most cases the distribution of the sample
kurtosis
isn't known analytically, but similarly large relative tolerances can be
expected if the kurtosis is near zero.
Implement
- LWG4053 Unary call to `std::views::repeat` does not decay the argument
- LWG4054 Repeating a `repeat_view` should repeat the view
Signed-off-by: yronglin <yronglin777@gmail.com>
The previous patch implemented
- P2713R1 Escaping improvements in std::format
- LWG3965 Incorrect example in [format.string.escaped] p3 for formatting
of combining characters
These changes were correct, but had a size and performance penalty. This
patch improves the size and performance of the previous patch. The
performance is still worse than before since the lookups may require two
property lookups instead of one before implementing the paper. The
changes give a tighter coupling between the Unicode data and the
algorithm. Additional tests are added to notify about changes in future
Unicode updates.
Before
```
-----------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char> 110704 ns 110696 ns 6206
BM_unicode_escaped<char> 101371 ns 101374 ns 6862
BM_cyrillic_escaped<char> 63329 ns 63327 ns 11013
BM_japanese_escaped<char> 41223 ns 41225 ns 16938
BM_emoji_escaped<char> 111022 ns 111021 ns 6304
BM_ascii_escaped<wchar_t> 112441 ns 112443 ns 6231
BM_unicode_escaped<wchar_t> 102776 ns 102779 ns 6813
BM_cyrillic_escaped<wchar_t> 58977 ns 58975 ns 11868
BM_japanese_escaped<wchar_t> 36885 ns 36886 ns 18975
BM_emoji_escaped<wchar_t> 115885 ns 115881 ns 6051
```
The first change is to manually encode the entire last area and make a
manual exception for the 240 excluded entries. This reduced the table
from 1077 to 729 entries and gave the following benchmark results.
```
-----------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char> 104777 ns 104776 ns 6550
BM_unicode_escaped<char> 96980 ns 96982 ns 7238
BM_cyrillic_escaped<char> 60254 ns 60251 ns 11670
BM_japanese_escaped<char> 44452 ns 44452 ns 15734
BM_emoji_escaped<char> 104557 ns 104551 ns 6685
BM_ascii_escaped<wchar_t> 107456 ns 107454 ns 6505
BM_unicode_escaped<wchar_t> 96219 ns 96216 ns 7301
BM_cyrillic_escaped<wchar_t> 56921 ns 56904 ns 12288
BM_japanese_escaped<wchar_t> 39530 ns 39529 ns 17492
BM_emoji_escaped<wchar_t> 108494 ns 108496 ns 6408
```
An entry in the table can only contain 2048 code points. For larger
ranges there are multiple entries split in chunks with a maximum size of
2048 entries. To encode the entire Unicode code point range 21 bits are
required. The manual part starts at 0x323B0 this means all entries in
the table fit in 18 bits. This allows to allocate 3 additional bits for
the range. This allows entries to have 16384 elements. This range always
avoids splitting the range in multiple chunks.
This reduces the number of table elements from 729 to 711 and gives the
following benchmark results.
```
-----------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char> 104289 ns 104289 ns 6619
BM_unicode_escaped<char> 96682 ns 96681 ns 7215
BM_cyrillic_escaped<char> 59673 ns 59673 ns 11732
BM_japanese_escaped<char> 41983 ns 41982 ns 16646
BM_emoji_escaped<char> 104119 ns 104120 ns 6683
BM_ascii_escaped<wchar_t> 104503 ns 104505 ns 6693
BM_unicode_escaped<wchar_t> 93426 ns 93423 ns 7489
BM_cyrillic_escaped<wchar_t> 54858 ns 54859 ns 12742
BM_japanese_escaped<wchar_t> 36385 ns 36384 ns 19259
BM_emoji_escaped<wchar_t> 105608 ns 105610 ns 6592
```
Primary motivation: is that after #84651 msan will
complain if fields accessed after ~__no_destroy.
My understanding of the https://eel.is/c++draft/basic.life#10
Static object with trivial destruction has program lifetime.
Static object with empty destuctor has implicit lifetime, and
accessing the object after lifetime is UB.
It was UB before #84651, it's just msan ignored union members.
Existing code with unions uses empty destructor, so accessing after
the main() can cause UB.
"placement new" version can have trivial destructor, so there is no end
of lifetime.
Secondary motivation: empty destructor will register __cxa_atexit with
-O0.
https://gcc.godbolt.org/z/hce587b65
We can not remove the destructor with union where
_Tp can have non-trivial destructor.
But we can remove destructor if we use in-place
new instead of union.
https://gcc.godbolt.org/z/Yqxx57eEd - empty even with -O0.
New test fails without the patch on
https://lab.llvm.org/buildbot/#/builders/sanitizer-x86_64-linux-bootstrap-msan
These macros are used by STL implementations to support implementation
of std::hardware_destructive_interference_size and
std::hardware_constructive_interference_size
Fixes#60174
---------
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
This pull request implements LWG3984: ranges::to's recursion branch
may be ill-formed.
In the current implementation, ranges::to's recursion branch pipes the
range into a `views::transform(/* lambda */)`, which is a __range_adaptor_closure
object. In libc++, the pipe operator of __range_adaptor_closure requires a
viewable_range, so the following code won't compile, as the type of lvalue
`r` doesn't model viewable_range:
#include <ranges>
#include <vector>
#include <list>
int main() {
std::vector<std::vector<int>> v;
auto r = std::views::all(std::move(v));
auto l = std::ranges::to<std::list<std::list<int>>>(r);
}
Co-authored-by: A. Jiang <de34@live.cn>
Since C++14 has been released for about nine years and most standard
libraries have implemented sized deallocation functions, it's time to
make this feature default again.
This is another try of https://reviews.llvm.org/D112921.
Fixes#60061
The change increments the size of the lookup table considerably. The
table has an "upper boundary" check. The removal of the code units with
the property Grapheme_Extend=Yes removes the range E0100..E01EF. This
breaks the trailing large continuous section in two parts. This will be
improved in a followup patch.
Implements:
- P2713R1 Escaping improvements in std::format
- LWG3965 Incorrect example in [format.string.escaped] p3 for formatting
of combining characters
```
---------------------------------------------------------
Benchmark Before After
---------------------------------------------------------
BM_ascii_escaped<char> 95696 ns 110704 ns
BM_unicode_escaped<char> 89311 ns 101371 ns
BM_cyrillic_escaped<char> 58633 ns 63329 ns
BM_japanese_escaped<char> 44500 ns 41223 ns
BM_emoji_escaped<char> 99156 ns 111022 ns
BM_ascii_escaped<wchar_t> 92245 ns 112441 ns
BM_unicode_escaped<wchar_t> 80970 ns 102776 ns
BM_cyrillic_escaped<wchar_t> 51253 ns 58977 ns
BM_japanese_escaped<wchar_t> 37252 ns 36885 ns
BM_emoji_escaped<wchar_t> 96226 ns 115885 ns
```
This patch implements LWG4023 by adding explicit assertions for the
added preconditions and also fixes a few tests that were violating these
preconditions.
Testing with the get_info() returning a local_info revealed some issues
in the reverse lookup. This needed an additional quirk. Also the
skipping when not in the current continuation optimization was wrong. It
prevented merging two sys_info objects.
When trying to express a time before the epoch (e.g. "one nanosecond
before 00:01:40 on 1900-01-01")
the date would be shown as:
1900-01-01 00:01:39.-00000001
After this patch, that time would be correctly shown as:
1900-01-01 00:01:39.999999999
This patch finalizes the std::ranges::range_adaptor_closure
class template from https://wg21.link/P2387R3.
// [range.adaptor.object], range adaptor objects
template<class D>
requires is_class_v<D> && same_as<D, remove_cv_t<D>>
class range_adaptor_closure { };
The current implementation of __range_adaptor_closure was introduced
in ee44dd8062a26541808fc0d3fd5c6703e19f6016 and has served as the
foundation for the range adaptors in libc++ for a while. This patch
keeps its implementation, with the exception of the following changes:
- __range_adaptor_closure now includes the missing constraints
`is_class_v<D> && same_as<D, remove_cv_t<D>>` to restrict the
type of class that can inherit from it. (https://eel.is/c++draft/ranges.syn)
- The operator| of __range_adaptor_closure no longer requires its
first argument to model viewable_range. (https://eel.is/c++draft/range.adaptor.object#1)
- The _RangeAdaptorClosure concept is refined to exclude cases where
T models range or where T has base classes of type range_adaptor_closure<U>
for another type U. (https://eel.is/c++draft/range.adaptor.object#2)