This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).
Originally suggested here: https://reviews.llvm.org/D147680
String annotations added here:
https://github.com/llvm/llvm-project/pull/72677
Requires to pass CI without fails:
- https://github.com/llvm/llvm-project/pull/75845
- https://github.com/llvm/llvm-project/pull/75858
Annotating `std::basic_string` with default allocator is implemented in
https://github.com/llvm/llvm-project/pull/72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.
Support in ASan API exists since
dd1b7b797a.
You can turn off annotations for a specific allocator based on changes
from
2fa1bec7a2.
This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.
The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.
If you have any questions, please email:
advenam.tacet@trailofbits.comdisconnect3d@trailofbits.com
This commit turns on ASan annotations in `std::basic_string` for all
allocators by default.
Originally suggested here: https://reviews.llvm.org/D146214
String annotations added here:
https://github.com/llvm/llvm-project/pull/72677
This commit is part of our efforts to support container annotations with
(almost) every allocator. Annotating `std::basic_string` with default
allocator is implemented in
https://github.com/llvm/llvm-project/pull/72677.
Additionally it removes `__begin != nullptr` because `data()` should
never return a nullptr.
Support in ASan API exists since
1c5ad6d2c0.
This patch removes the check in std::basic_string annotation member
function (__annotate_contiguous_container) to support different
allocators.
You can turn off annotations for a specific allocator based on changes
from
2fa1bec7a2.
The motivation for a research and those changes was a bug, found by
Trail of Bits, in a real code where an out-of-bounds read could happen
as two strings were compared via a call to `std::equal` that took
`iter1_begin`, `iter1_end`, `iter2_begin` iterators (with a custom
comparison function). When object `iter1` was longer than `iter2`, read
out-of-bounds on `iter2` could happen. Container sanitization would
detect it.
If you have any questions, please email:
- advenam.tacet@trailofbits.com
- disconnect3d@trailofbits.com
This commit is a refactor (increases readability) and optimization fix.
This is a fixed commit of
https://github.com/llvm/llvm-project/pull/76200 First reverthed here:
1ea7a56057
Please, check original PR for details.
The difference is a return type of the lambda.
Original description:
This commit addresses optimization and instrumentation challenges
encountered within comma constructors.
1) _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS does not work in comma
constructors.
2) Code inside comma constructors is not always correctly optimized.
Problematic code examples:
- `: __r_(((__str.__is_long() ? 0 : (__str.__annotate_delete(), 0)),
std::move(__str.__r_))) {`
- `: __r_(__r_([&](){ if(!__s.__is_long()) __s.__annotate_delete();
return std::move(__s.__r_);}())) {`
However, lambda with argument seems to be correctly optimized. This
patch uses that fact.
Use of lambda based on idea from @ldionne.
This commit addresses optimization and instrumentation challenges
encountered within comma constructors.
1) _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS does not work in comma
constructors.
2) Code inside comma constructors is not always correctly optimized.
Problematic code examples:
- `: __r_(((__str.__is_long() ? 0 : (__str.__annotate_delete(), 0)),
std::move(__str.__r_))) {`
- `: __r_(__r_([&](){ if(!__s.__is_long()) __s.__annotate_delete();
return std::move(__s.__r_);}())) {`
However, lambda with argument seems to be correctly optimized. The patch employs this.
Use of lambda based on an idea from @ldionne.
This commit implements conditional compilation for ASan helper code.
As convey to me by @EricWF, string benchmarks with UBSan have been
experiencing significant performance hit after the commit with ASan
string annotations. This is likely due to the fact that no-op ASan code
is not optimized out with UBSan. To address this issue, this commit
conditionalizes the inclusion of ASan helper function bodies using
`#ifdef` directives. This approach allows us to selectively include only
the ASan code when it's actually required, thereby enhancing
optimizations and improving performance.
While issue was noticed in string benchmarks, I expect same overhead
(just less noticeable) in other containers, therefore `std::vector` and
`std::deque` have same changes.
To see impact of that change run `string.libcxx.out` with UBSan and
`--benchmark_filter=BM_StringAssign` or
`--benchmark_filter=BM_StringConstruct`.
This patch runs clang-format on all of libcxx/include and libcxx/src, in
accordance with the RFC discussed at [1]. Follow-up patches will format
the benchmarks, the test suite and remaining parts of the code. I'm
splitting this one into its own patch so the diff is a bit easier to
review.
This patch was generated with:
find libcxx/include libcxx/src -type f \
| grep -v 'module.modulemap.in' \
| grep -v 'CMakeLists.txt' \
| grep -v 'README.txt' \
| grep -v 'libcxx.imp' \
| grep -v '__config_site.in' \
| xargs clang-format -i
A Git merge driver is available in libcxx/utils/clang-format-merge-driver.sh
to help resolve merge and rebase issues across these formatting changes.
[1]: https://discourse.llvm.org/t/rfc-clang-formatting-all-of-libc-once-and-for-all
It's not that I have much love for C++03, but we should ensure that it
works. Some recent changes broke this configuration because slightly
older Clang versions don't support attribute syntax in C++03 mode.
This commit introduces basic annotations for `std::basic_string`,
mirroring the approach used in `std::vector` and `std::deque`.
Initially, only long strings with the default allocator will be
annotated. Short strings (_SSO - short string optimization_) and strings
with non-default allocators will be annotated in the near future, with
separate commits dedicated to enabling them. The process will be similar
to the workflow employed for enabling annotations in `std::deque`.
**Please note**: these annotations function effectively only when libc++
and libc++abi dylibs are instrumented (with ASan). This aligns with the
prevailing behavior of Memory Sanitizer.
To avoid breaking everything, this commit also appends
`_LIBCPP_INSTRUMENTED_WITH_ASAN` to `__config_site` whenever libc++ is
compiled with ASan. If this macro is not defined, string annotations are
not enabled. However, linking a binary that does **not** annotate
strings with a dynamic library that annotates strings, is not permitted.
Originally proposed here: https://reviews.llvm.org/D132769
Related patches on Phabricator:
- Turning on annotations for short strings:
https://reviews.llvm.org/D147680
- Turning on annotations for all allocators:
https://reviews.llvm.org/D146214
This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.
The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.
This Pull Request introduces basic annotations for `std::basic_string`.
Long strings exhibit structural similarities to `std::vector` and will
be annotated accordingly. Short strings are already implemented, but
will be turned on separately in a forthcoming commit. Look at [a
comment](https://github.com/llvm/llvm-project/pull/72677#issuecomment-1850554465)
below to read about SSO issues at current moment.
Due to the functionality introduced in
[D132522](dd1b7b797a),
the `__sanitizer_annotate_contiguous_container` function now offers
compatibility with all allocators. However, enabling this support will
be done in a subsequent commit. For the time being, only strings with
the default allocator will be annotated.
If you have any questions, please email:
- advenam.tacet@trailofbits.com
- disconnect3d@trailofbits.com
This is in preparation for clang-formatting the whole code base. These
annotations are required either to avoid clang-format bugs or because
the manually formatted code is significantly more readable than the
clang-formatted alternative. All in all, it seems like very few
annotations are required, which means that clang-format is doing a very
good job in most cases.
The intent of these particular functions, since their introduction, was
to NOT be inlinable.
However, the mechanism by which this was accomplished was non-obvious,
and stopped working when string is compiled for C++20.
A longstanding behavior specified by the C++ standard is that
instantiation of the body of a template function is suppressed by an
extern template declaration -- unless the function is explicitly marked
either constexpr or inline. Of course, if the body is not instantiated,
then it cannot possibly be inlined, and thus all the functions listed in
libcxx/include/__string/extern_template_lists.h were uninlineable.
But, in C++20 mode, string functions were annotated constexpr, which
means they _are_ instantiated, and do become inlineable. And, in fact,
they do get inlined, which has caused noticeable binary-size growth for
users.
For example, in C++17,
`std::string f(std::string *in) { return *in; }`
does not inline the copy-constructor call, and instead generates a call
to the exported function defined in the libc++ shared library.
I think we probably don't want to mark all functions that are currently
in the extern template list as noinline, as many of them really are
reasonable inlining candidates. Thus, I've restricted this change to
only the few functions that were clearly intended to be outlined.
See commits like b019c5c0372eb08800327efb5e7955ce918b75d1 (and some
others like it) for background, in which functions were removed from the
extern template list in the unstable ABI in order to allow the
short-string case to be inlined, while moving the long-string case to a
separate function, added to the extern template list.
If we know that index is larger than SSO size, we know that we can't be
in SSO case, and should access the pointer. This removes extra check
from operator[] for inputs known at compile time to be larger than SSO.
This allows smaller allocations to occur, closer to the actual
std::string's required size. This is particularly effective in
decreasing the allocation size upon initial construction (where
__recommend is called to determine the size).
Although the memory savings per-string are never more than 8 bytes per
string initially, this quickly adds up. And has lead to not insigficant
memory savings at Google.
Unfortunately, this change is ABI breaking because it changes the value
returned by max_size. So it has to be guarded.
Previously, libcxx forced all strings created during constant evaluation
to point to allocated memory. That was done due to implementation
difficultites, but it turns out not to be necessary. This patch permits
the use of SSO strings during constant evaluation, and also simplifies
the implementation.
This does have a downside in terms of enabling users to accidentally
write non-portable code, however, which I've documented in
UsingLibcxx.rst.
In particular, whether `constinit std::string x = "...";` will
successfully compile now depends on whether the string is smaller than
the SSO capacity -- in libc++, up to 22 bytes on 64-bit platforms, and
up to 10 bytes on 32-bit platforms. By comparison, libstdc++ and MSVC
have an SSO capacity of 15 bytes, except that in libstdc++,
constant-initialized strings cannot be used as function-locals because
the object contains a pointer to itself.
Closes#68434
Previously, assignment to a std::basic_string type with a _custom_
allocator could under certain conditions attempt to interpret part of
the target string's "short" string-content as if it was a "long" data
pointer, and attempt to deallocate a garbage value.
This is a serious bug, but code in which it might happen is rare. It
required:
1. the basic_string must be using a custom allocator type which sets the
propagate_on_container_copy_assignment trait to true (thus, it does not
affect the default allocator, nor most custom allocators).
2. the allocator for the target string must compare not equal to the
allocator for the source string (many allocators always compare equal).
3. the source of the copy must currently contain a "long" string, and
the assignment-target must currently contain a "short" string.
Finally, the issue would've typically been innocuous when the bytes
misinterpreted as a pointer were all zero, as deallocating a nullptr is
typically a no-op. This is why existing test cases did not exhibit an
issue: they were all zero-length strings, which do not have data in the
bytes interpreted as a pointer.
This commit deprecates `std::basic_string::__grow_by`, which is part of ABIv1. The function is replaced by `std::basic_string:__grow_by_without_replace`, which is not part of ABI.
- The original function `__grow_by` is deprecated because it does not set the string size, therefore it may not update the size when the size is changed, and it may also not set the size at all when the string was short initially. This leads to unpredictable size value. It is not removed or changed to avoid breaking the ABI.
- The commit adds `_LIBCPP_HIDE_FROM_ABI` guarded by `_LIBCPP_ABI_VERSION >= 2` to `__grow_by`. This allows the function to be used in the dylib in ABIv1 without raising the `[abi:v170000]` error and removes it from future ABIs. `_LIBCPP_HIDE_FROM_ABI_AFTER_V1` cannot be used.
- Additionally, `__grow_by` has been removed from `_LIBCPP_STRING_UNSTABLE_EXTERN_TEMPLATE_LIST` in `libcxx/include/__string/extern_template_lists.h`.
This bugfix is necessary to implement string ASan annotations, because it mitigates the problems encountered in D132769.
Reviewed By: ldionne, #libc, philnik
Differential Revision: https://reviews.llvm.org/D148693
Remove spaces between operator"" and identifier to suppress
-Wdeprecated-literal-operator, and between operator and ""
like how they are written in [string.view.literals] and [basic.string.literals].
Differential Revision: https://reviews.llvm.org/D155200
- add the `from_range_t` constructors and the related deduction guides;
- add the `insert_range`/`assign_range`/etc. member functions.
(Note: this patch is split from https://reviews.llvm.org/D142335)
Differential Revision: https://reviews.llvm.org/D149832
Replace most uses of `_LIBCPP_ASSERT` with
`_LIBCPP_ASSERT_UNCATEGORIZED`.
This is done as a prerequisite to introducing hardened mode to libc++.
The idea is to make enabling assertions an opt-in with (somewhat)
fine-grained controls over which categories of assertions are enabled.
The vast majority of assertions are currently uncategorized; the new
macro will allow turning on `_LIBCPP_ASSERT` (the underlying mechanism
for all kinds of assertions) without enabling all the uncategorized
assertions (in the future; this patch preserves the current behavior).
Differential Revision: https://reviews.llvm.org/D153816
These macros are always defined identically, so we can simplify the code a bit by merging them.
Reviewed By: ldionne, #libc
Spies: libcxx-commits, krytarowski, smeenai
Differential Revision: https://reviews.llvm.org/D152652
This checks whether a pointer is within a range, even during constant evaluation. This allows running optimized code paths during constant evaluation, instead of falling back to the general-purpose implementation all the time. This is also a central place for comparing unrelated pointers, which is technically UB.
Reviewed By: ldionne, #libc
Spies: libcxx-commits
Differential Revision: https://reviews.llvm.org/D143327
During the ISO C++ Committee meeting plenary session the C++23 Standard
has been voted as technical complete.
This updates the reference to c++2b to c++23 and updates the __cplusplus
macro.
Note since we use clang-tidy 16 a small work-around is needed. Clang
knows -std=c++23 but clang-tidy not so for now force the lit compiler
flag to use -std=c++2b instead of -std=c++23.
Reviewed By: #libc, philnik, jloser, ldionne
Differential Revision: https://reviews.llvm.org/D150795
We plan to add concepts for checking that iterators actually provide what they claim to. This is to avoid people thinking that these type traits actually check the iterator requirements in more detail.
Reviewed By: ldionne, #libc
Spies: Mordante, libcxx-commits, wenlei
Differential Revision: https://reviews.llvm.org/D150801
Makes sure the formatter for the vector<bool>::reference is enabled
when only the header <vector> is included. Before this change it
required <vector> and <format> to be included. This violated the
requirements in the Standard.
Fixes: https://llvm.org/PR61314
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D149543
We changed the `abort` calls when trying to throw exceptions in `-fno-exceptions` mode to `__verbose_abort` calls, which removes the dependency in most files.
Reviewed By: ldionne, #libc
Spies: dim, emaste, mikhail.ramalho, smeenai, libcxx-commits
Differential Revision: https://reviews.llvm.org/D146076
This allows the compiler to inline the constructors.
Reviewed By: ldionne, #libc
Spies: mikhail.ramalho, libcxx-commits
Differential Revision: https://reviews.llvm.org/D144580
This changes all `__enable_if`s inside `<string>` to a common pattern. Specifically, it's always inside the `template <>` and uses the `, int> = 0` style.
Reviewed By: ldionne, #libc
Spies: mikhail.ramalho, EricWF, libcxx-commits
Differential Revision: https://reviews.llvm.org/D144568
This patch also updates the moved code to the new style (i.e. formatted, replaced marcos and typedefs)
Reviewed By: ldionne, #libc
Spies: arichardson, libcxx-commits
Differential Revision: https://reviews.llvm.org/D145095
Avoids using operator& in basic_string since an evil char-like type can
hijack this operator. Added some more evil operators, this found a place
where equality was compared directly and not via the traits.
This adds a helper test string. This is now only used in a few tests,
but the intention is to use this in more tests for basic_string.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D145257
Other macros that disable parts of the library are named `_LIBCPP_HAS_NO_WHATEVER`.
Reviewed By: ldionne, Mordante, #libc
Spies: libcxx-commits, smeenai
Differential Revision: https://reviews.llvm.org/D143163
This change is almost fully mechanical. The only interesting change is in `generate_feature_test_macro_components.py` to generate `_LIBCPP_STD_VER >=` instead. To avoid churn in the git-blame this commit should be added to the `.git-blame-ignore-revs` once committed.
Reviewed By: ldionne, var-const, #libc
Spies: jloser, libcxx-commits, arichardson, arphaman, wenlei
Differential Revision: https://reviews.llvm.org/D143962
This prepares the terrain for introducing a new type of bounded iterator
that can't be constructed like __wrap_iter. This reverts part of the
changes made to std::vector in 4eab04f84.
Differential Revision: https://reviews.llvm.org/D138036
I am starting to granularize debug-mode checks so they can be controlled
more individually. The goal is for vendors to eventually be able to select
which categories of checks they want embedded in their configuration of
the library with more granularity.
Note that this patch is a bit weird on its own because it does not touch
any of the containers that implement iterator bounds checking through the
__dereferenceable check of the legacy debug mode. However, I added TODOs
to string and vector to change that.
Differential Revision: https://reviews.llvm.org/D138033
This removes a lot of boilerplate.
Reviewed By: ldionne, #libc, EricWF
Spies: EricWF, libcxx-commits
Differential Revision: https://reviews.llvm.org/D137025
This doesn't affect our ABI because `std::string::substr()` isn't in the dylib and the mangling of `substr() const` and `substr() const&` are different.
Reviewed By: ldionne, Mordante, var-const, avogelsgesang, #libc
Spies: arphaman, huixie90, libcxx-commits
Differential Revision: https://reviews.llvm.org/D131668
We currently define the preferred names in multiple places. `basic_string` and `basic_string_view` also have a lot of aliases, which makes the declarations quite long. So let's only add the preferred names in forward-declaring headers to make the implementation more readable and have all the preferred names in one place.
Reviewed By: ldionne
Spies: EricWF, krytarowski, libcxx-commits
Differential Revision: https://reviews.llvm.org/D135824