The current implementation of `{std, ranges}::equal` fails to correctly
compare `vector<bool>`s when the underlying storage type is smaller than
`int` (e.g., `unsigned char`, `unsigned short`, `uint8_t` and
`uint16_t`). See [demo](https://godbolt.org/z/j4s87s6b3)). The problem
arises due to integral promotions on the intermediate bitwise
operations, leading to incorrect final equality comparison results. This
patch fixes the issue by ensuring that `{std, ranges}::equal` operate
properly for both aligned and unaligned bits.
Fixes#126369.
This PR optimizes the performance of `std::ranges::rotate` for
`vector<bool>::iterator`. The optimization yields a performance
improvement of up to 2096x.
Closes#64038.
This PR optimizes the performance of `std::ranges::equal` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038.
The optimizations yield performance improvements of up to 188x for
aligned equality comparison and 82x for unaligned equality
comparison. Moreover, comprehensive tests covering up to 4 storage words
(256 bytes) with odd and even bit sizes are provided, which validate the
proposed optimizations in this patch.
This is technically not necessary in most cases to prevent issues with ADL,
but let's be consistent. This allows us to remove the libcpp-qualify-declval
clang-tidy check, which is now enforced by the robust-against-adl clang-tidy check.
This is to fix compile error with explicit Clang modules like
```
../../third_party/libc++/src/include/__vector/vector_bool.h:85:11: error: default argument of '__bit_iterator' must be imported from module 'std.bit_reference_fwd' before it is required
85 | typedef __bit_iterator<vector, false> pointer;
| ^
../../third_party/libc++/src/include/__fwd/bit_reference.h:23:68: note: default argument declared here is not reachable
23 | template <class _Cp, bool _IsConst, typename _Cp::__storage_type = 0>
| ^
```
This PR slightly simplifies the implementation of `vector<bool>::max_size`
and adds extensive tests for the `max_size()` function for both `vector<bool>`
and `vector<T>`. The main purposes of the new tests include:
- Verify correctness of `max_size()` under various `size_type` and
`difference_type` definitions: check that `max_size()` works properly
with allocators that have custom `size_type` and `difference_type`. This
is particularly useful for `vector<bool>`, as different `size_type` lead
to different `__storage_type` of different word lengths, resulting in
varying `max_size()` values for `vector<bool>`. Additionally, different
`difference_type` also sets different upper limit of `max_size()` for
both `vector<bool>` and `std::vector`. These tests were previously
missing.
- Eliminate incorrect implementations: Special tests are added to identify and
reject incorrect implementations of `vector<bool>::max_size` that unconditionally
return `std::min<size_type>(size-max, __internal_cap_to_external(allocator-max-size))`.
This can cause overflow in the `__internal_cap_to_external()` call and lead
to incorrect results. The new tests ensure that such incorrect
implementations are identified.
There's no reason not to, and it's easy enough to do using enable_if. As
a drive-by change, also add a missing _LIBCPP_NO_CFI attribute on
__add_alignment_assumption.
As a follow-up to #121013 (which focused on `std::ranges::copy`), this
PR optimizes the performance of `std::ranges::copy_backward` for
`vector<bool>::iterator`, addressing a subtask outlined in issue #64038.
The optimizations yield performance improvements of up to 2000x for
aligned copies and 60x for unaligned copies.
This patch simplifies the implementation of `__construct_at_end` in
`vector<bool>`, which currently contains duplicate initialization logic
across its two overloads.
Changes:
- Carve out sized but input-only ranges for C++23.
- Call `std::move` for related functions when the iterator is possibly input-only.
Fixes#115727
This PR addresses an issue where the `shrink_to_fit` function in
`vector<bool>` is effectively a no-op, meaning it will never shrink the
capacity.
Fixes#122502
In https://reviews.llvm.org/D136765 / https://reviews.llvm.org/D144155,
the asan annotations for `std::vector` were modified to unpoison freed
backing memory on destruction, instead of leaving it poisoned. However,
calling `__clear()` instead of `clear()` skips informing the asan runtime
of this decrease in the accessible container size, which breaks the
invariant that the value of `old_mid` should match the value of `new_mid`
from the previous call to `__sanitizer_annotate_contiguous_container`, which
can trip the sanity checks for the partial poison between [d1, d2) and the
container redzone between [d2, c), if enabled. To fix this, ensure that
`clear()` is called instead, as is already done by `__vdeallocate()`.
Also remove `__clear()`, since it is no longer called.
Missing information about begin and end pointers of std::vector can lead
to missed optimizations in LLVM.
This patch adds alignment assumptions at the point where the begin and
end pointers are loaded. If the pointers would not have the same
alignment, end might never get hit when incrementing begin.
See https://github.com/llvm/llvm-project/issues/101372 for a discussion
of missed range check optimizations in hardened mode.
Once https://github.com/llvm/llvm-project/pull/108958 lands, the created
`llvm.assume` calls for the alignment should be folded into the `load`
instructions, resulting in no extra instructions after InstCombine.
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
As a follow-up to #113852, this PR optimizes the performance of the
`insert(const_iterator pos, InputIt first, InputIt last)` function for
`input_iterator`-pair inputs in `std::vector` for cases where
reallocation occurs during insertion. Additionally, this optimization
enhances exception safety by replacing the traditional `try-catch`
mechanism with a modern exception guard for the `insert` function.
The optimization targets cases where insertion trigger reallocation. In
scenarios without reallocation, the implementation remains unchanged.
Previous implementation
-----------------------
The previous implementation of `insert` is inefficient in reallocation
scenarios because it performs the following steps separately:
- `reserve()`: This leads to the first round of relocating old
elements to new memory;
- `rotate()`: This leads to the second round of reorganizing the
existing elements;
- Move-forward: Moves the elements after the insertion position to
their final positions.
- Insert: performs the actual insertion.
This approach results in a lot of redundant operations, requiring the
elements to undergo three rounds of relocations/reorganizations to be
placed in their final positions.
Proposed implementation
-----------------------
The proposed implementation jointly optimize the above 4 steps in the
previous implementation such that each element is placed in its final
position in just one round of relocation. Specifically, this
optimization reduces the total cost from 2 relocations + 1 std::rotate
call to just 1 relocation, without needing to call `std::rotate`,
thereby significantly improving overall performance.
This PR fixes the erroneous internal capacity evaluation in
`vector<bool>`, which caused a subsequent SIGSEGV error when calling
`flip()` on `vector<bool>`. By fixing the internal capacity evaluation,
the SIGSEGV is automatically resolved.
The `vector<bool>` implementation in libcxx contains a declaration of a
private `__append` function, which is neither defined nor used anywhere
in the codebase. This PR aims to remove this abandoned declaration, as
its presence is misleading and could lead to confusion during future
maintenance.
I have no idea why we have a declaration without a definition. My guess
is that the declaration might be inherited from the implementation of
`vector<T>`, where `__append` is both necessary and properly defined.
The declaration may have been inadvertently copied from `vector<T>` to
`vector<bool>` and subsequently abandoned, as `vector<bool>` never needs
it.
This PR optimizes the input iterator overload of `assign(_InputIterator,
_InputIterator)` in `std::vector<_Tp, _Allocator>` by directly assigning
to already initialized memory, rather than first destroying existing
elements and then constructing new ones. By eliminating unnecessary
destruction and construction, the proposed algorithm enhances the
performance by up to 2x for trivial element types (e.g.,
`std::vector<int>`), up to 2.6x for non-trivial element types like
`std::vector<std::string>`, and up to 3.4x for more complex non-trivial
types (e.g., `std::vector<std::vector<int>>`).
### Google Benchmarks
Benchmark tests (`libcxx/test/benchmarks/vector_operations.bench.cpp`)
were conducted for the `assign()` implementations before and after this
patch. The tests focused on trivial element types like
`std::vector<int>`, and non-trivial element types such as
`std::vector<std::string>` and `std::vector<std::vector<int>>`.
#### Before
```
-------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------------------------
BM_AssignInputIterIter/vector_int/1024/1024 1157 ns 1169 ns 608188
BM_AssignInputIterIter<32>/vector_string/1024/1024 14559 ns 14710 ns 47277
BM_AssignInputIterIter<32>/vector_vector_int/1024/1024 26846 ns 27129 ns 25925
```
#### After
```
-------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------------------------
BM_AssignInputIterIter/vector_int/1024/1024 561 ns 566 ns 1242251
BM_AssignInputIterIter<32>/vector_string/1024/1024 5604 ns 5664 ns 128365
BM_AssignInputIterIter<32>/vector_vector_int/1024/1024 7927 ns 8012 ns 88579
```
This PR simplifies the implementation of std::vector's move constructor
with an alternative allocator by invoking __init_with_size() instead of
calling assign(), which ultimately calls __assign_with_size(). The
advantage of using __init_with_size() lies in its internal use of
an exception guard, which simplifies the code. Furthermore, from a
semantic standpoint, it is more intuitive for a constructor to call
an initialization function than an assignment function.
Now that we don't use __compressed_pair anymore inside std::vector, we
can remove some unnecessary accessors. This is a mechanical replacement
of the __alloc() and __end_cap() accessors, and similar for
std::vector<bool>.
Note that I consistently used this->__alloc_ instead of just __alloc_
since most of the code in <vector> uses that pattern to access members.
I don't think this is necessary anymore (and I'm even less certain I
like this), but I went for consistency with surrounding code. If we want
to change that, we can do a follow-up mechanical change.
Related to PR #114423, this PR proposes to unify the naming of the
internal pointer members in `std::vector` and `std::__split_buffer` for
consistency and clarity.
Both `std::vector` and `std::__split_buffer` originally used a
`__compressed_pair<pointer, allocator_type>` member named `__end_cap_`
to store an internal capacity pointer and an allocator. However,
inconsistent naming changes have been made in both classes:
- `std::vector` now uses `__cap_` and `__alloc_` for its internal
pointer and allocator members.
- In contrast, `std::__split_buffer` retains the name `__end_cap_` for
the capacity pointer, along with `__alloc_`.
This inconsistency between the names `__cap_` and `__end_cap_` has
caused confusions (especially to myself when I was working on both
classes). I suggest unifying these names by renaming `__end_cap_` to
`__cap_` in `std::__split_buffer`.
`std::copy` doesn't use the `_AlgPolicy` for anything other than calling
itself with it, so we can just remove the argument. This also removes
the need in a few other algorithms which had an `_AlgPolicy` argument
only to call `copy`.
This helps with readability since the reader doesn't need to jump around
a bunch of times only to read functions that are implemented in 1 or 2
lines of code.
This PR refactors the `__split_buffer` class to eliminate code
duplication in the `push_back` and `push_front` member functions.
**Motivation:**
The lvalue and rvalue reference overloads of `push_back` share identical
logic, which coincides with that of `emplace_back`. Similarly, the
overloads of `push_front` also share identical logic but lack an
`emplace_front` member function, leading to an inconsistency. These
identical internal logics lead to significant code duplication, making
future maintenance more difficult.
**Summary of Refactor:**
This PR reduces code redundancy by:
1. Modifying both overloads of `push_back` to call `emplace_back`.
2. Introducing a new `emplace_front` member function that encapsulates
the logic of `push_front`, allowing both overloads of `push_front` to
call it (The addition of `emplace_front` also avoids the inconsistency
regarding the absence of `emplace_front`).
The refactoring results in reduced code duplication, improved
maintainability, and enhanced readability.
Added exception guard to the `vector(n, x, a)` constructor to enhance
exception safety. This change ensures that the `vector(n, x, a)`
constructor is consistent with other constructors, such as `vector(n)`,
`vector(n, x)`, `vector(n, a)`, in terms of exception safety.