mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-25 04:46:08 +00:00
[-Wunsafe-buffer-usage] Add user documentation. (#111624)
Originally: https://reviews.llvm.org/D136811
This commit is contained in:
parent
ad5e2bf6e9
commit
5000c688bf
585
clang/docs/SafeBuffers.rst
Normal file
585
clang/docs/SafeBuffers.rst
Normal file
@ -0,0 +1,585 @@
|
||||
================
|
||||
C++ Safe Buffers
|
||||
================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
Clang can be used to harden your C++ code against buffer overflows, an otherwise
|
||||
common security issue with C-based languages.
|
||||
|
||||
The solution described in this document is an integrated programming model as
|
||||
it combines:
|
||||
|
||||
- a family of opt-in Clang warnings (``-Wunsafe-buffer-usage``) emitted at
|
||||
during compilation to help you update your code to encapsulate and propagate
|
||||
the bounds information associated with pointers;
|
||||
- runtime assertions implemented as part of
|
||||
(`libc++ hardening modes <https://libcxx.llvm.org/Hardening.html>`_)
|
||||
that eliminate undefined behavior as long as the coding convention
|
||||
is followed and the bounds information is therefore available and correct.
|
||||
|
||||
The goal of this work is to enable development of bounds-safe C++ code. It is
|
||||
not a "push-button" solution; depending on your codebase's existing
|
||||
coding style, significant (even if largely mechanical) changes to your code
|
||||
may be necessary. However, it allows you to achieve valuable safety guarantees
|
||||
on security-critical parts of your codebase.
|
||||
|
||||
This solution is under active development. It is already useful for its purpose
|
||||
but more work is being done to improve ergonomics and safety guarantees
|
||||
and reduce adoption costs.
|
||||
|
||||
The solution aligns in spirit with the "Ranges" safety profile
|
||||
that was `proposed <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3274r0.pdf>`_
|
||||
by Bjarne Stroustrup for standardization alongside other C++ safety features.
|
||||
|
||||
|
||||
Pre-Requisites
|
||||
==============
|
||||
|
||||
In order to achieve bounds safety, your codebase needs to have access to
|
||||
well-encapsulated bounds-safe container, view, and iterator types.
|
||||
If your project uses libc++, standard container and view types such as
|
||||
``std::vector`` and ``std::span`` can be made bounds-safe by enabling
|
||||
the "fast" `hardening mode <https://libcxx.llvm.org/Hardening.html>`_
|
||||
(passing ``-D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_FAST``) to your
|
||||
compiler) or any of the stricter hardening modes.
|
||||
|
||||
In order to harden iterators, you'll need to also obtain a libc++ binary
|
||||
built with ``_LIBCPP_ABI_BOUNDED_ITERATORS`` -- which is a libc++ ABI setting
|
||||
that needs to be set for your entire target platform if you need to maintain
|
||||
binary compatibility with the rest of the platform.
|
||||
|
||||
A relatively fresh version of C++ is recommended. In particular, the very useful
|
||||
standard view class ``std::span`` requires C++20.
|
||||
|
||||
Other implementations of the C++ standard library may provide different
|
||||
flags to enable such hardening hardening.
|
||||
|
||||
If you're using custom containers and views, they will need to be hardened
|
||||
this way as well, but you don't necessarily need to do this ahead of time.
|
||||
|
||||
This approach can theoretically be applied to plain C codebases,
|
||||
assuming that safe primitives are developed to encapsulate all buffer accesses,
|
||||
acting as "hardened custom containers" to replace raw pointers.
|
||||
However, such approach would be very unergonomic in C, and safety guarantees
|
||||
will be lower due to lack of good encapsulation technology. A better approach
|
||||
to bounds safety for non-C++ programs,
|
||||
`-fbounds-safety <https://clang.llvm.org/docs/BoundsSafety.html>`_,
|
||||
is currently in development.
|
||||
|
||||
Technically, safety guarantees cannot be provided without hardening
|
||||
the entire technology stack, including all of your dependencies.
|
||||
However, applying such hardening technology to even a small portion
|
||||
of your code may be significantly better than nothing.
|
||||
|
||||
|
||||
The Programming Model for C++
|
||||
=============================
|
||||
|
||||
Assuming that hardened container, view, and iterator classes are available,
|
||||
what remains is to make sure they are used consistently in your code.
|
||||
Below we define the specific coding convention that needs to be followed
|
||||
in order to guarantee safety and how the compiler technology
|
||||
around ``-Wunsafe-buffer-usage`` assists with that.
|
||||
|
||||
|
||||
Buffer operations should never be performed over raw pointers
|
||||
-------------------------------------------------------------
|
||||
|
||||
Every time a memory access is made, a bounds-safe program must guarantee
|
||||
that the range of accessed memory addresses falls into the boundaries
|
||||
of the memory allocated for the object that's being accessed.
|
||||
In order to establish such a guarantee, the information about such valid range
|
||||
of addresses -- the **bounds information** associated with the accessed address
|
||||
-- must be formally available every time a memory access is performed.
|
||||
|
||||
A raw pointer does not naturally carry any bounds information.
|
||||
The bounds information for the pointer may be available *somewhere*, but
|
||||
it is not associated with the pointer in a formal manner, so a memory access
|
||||
performed through a raw pointer cannot be automatically verified to be
|
||||
bounds-safe by the compiler.
|
||||
|
||||
That said, the Safe Buffers programming model does **not** try to eliminate
|
||||
**all** pointer usage. Instead it assumes that most pointers point to
|
||||
individual objects, not buffers, and therefore they typically aren't
|
||||
associated with buffer overflow risks. For that reason, in order to identify
|
||||
the code that requires manual intervention, it is desirable to initially shift
|
||||
the focus away from the pointers themselves, and instead focus on their
|
||||
**usage patterns**.
|
||||
|
||||
The compiler warning ``-Wunsafe-buffer-usage`` is built to assist you
|
||||
with this step of the process. A ``-Wunsafe-buffer-usage`` warning is
|
||||
emitted whenever one of the following **buffer operations** are performed
|
||||
on a raw pointer:
|
||||
|
||||
- array indexing with ``[]``,
|
||||
- pointer arithmetic,
|
||||
- bounds-unsafe standard C functions such as ``std::memcpy()``,
|
||||
- C++ smart pointer operations such as ``std::unique_ptr<T[N]>::operator[]()``,
|
||||
which unfortunately cannot be made fully safe within the rules of
|
||||
the C++ standard (as of C++23).
|
||||
|
||||
This is sufficient for identifying each raw buffer pointer in the program at
|
||||
**at least one point** during its lifetime across your software stack.
|
||||
|
||||
For example, both of the following functions are flagged by
|
||||
``-Wunsafe-buffer-usage`` because ``pointer`` gets identified as an unsafe
|
||||
buffer pointer. Even though the second function does not directly access
|
||||
the buffer, the pointer arithmetic operation inside it may easily be
|
||||
the only formal "hint" in the program that the pointer does indeed point
|
||||
to a buffer of multiple objects::
|
||||
|
||||
int get_last_element(int *pointer, size_t size) {
|
||||
return ptr[sz - 1]; // warning: unsafe buffer access
|
||||
}
|
||||
|
||||
int *get_last_element_ptr(int *pointer, size_t size) {
|
||||
return ptr + (size - 1); // warning: unsafe pointer arithmetic
|
||||
}
|
||||
|
||||
|
||||
All buffers need to be encapsulated into safe container and view types
|
||||
----------------------------------------------------------------------
|
||||
|
||||
It immediately follows from the previous requirement that once an unsafe pointer
|
||||
is identified at any point during its lifetime, it should be immediately wrapped
|
||||
into a safe container type (if the allocation site is "nearby") or a safe
|
||||
view type (if the allocation site is "far away"). Not only memory accesses,
|
||||
but also non-access operations such as pointer arithmetic need to be covered
|
||||
this way in order to benefit from the respective runtime bounds checks.
|
||||
|
||||
If a **container** type (``std::array``, ``std::vector``, ``std::string``)
|
||||
is used for allocating the buffer, this is the best-case scenario because
|
||||
the container naturally has access to the correct bounds information for the
|
||||
buffer, and the runtime bounds checks immediately kick in. Additionally,
|
||||
the container type may provide automatic lifetime management for the buffer
|
||||
(which may or may not be desirable).
|
||||
|
||||
If a **view** type is used (``std::span``, ``std::string_view``), this typically
|
||||
means that the bounds information for the "adopted" pointer needs to be passed
|
||||
to the view's constructor manually. This makes runtime checks immediately
|
||||
kick in with respect to the provided bounds information, which is an immediate
|
||||
improvement over the raw pointer. However, this situation is still fundamentally
|
||||
insufficient for security purposes, because **bounds information provided
|
||||
this way cannot be guaranteed to be correct**.
|
||||
|
||||
For example, the function ``get_last_element()`` we've seen in the previous
|
||||
section can be made **slightly** safer this way::
|
||||
|
||||
int get_last_element(int *pointer, size_t size) {
|
||||
std::span<int> sp(pointer, size);
|
||||
return sp[size - 1]; // warning addressed
|
||||
}
|
||||
|
||||
Here ``std::span`` eliminates the potential concern that the operation
|
||||
``size - 1`` may overflow when ``sz`` is equal to ``0``, leading to a buffer
|
||||
"underrun". However, such program does not provide a guarantee that
|
||||
the variable ``sz`` correctly represents the **actual** size fo the buffer
|
||||
pointed to by ``ptr``. The ``std::span`` constructed this way may be ill-formed.
|
||||
It may fail to protect you from overrunning the original buffer.
|
||||
|
||||
The following example demonstrates one of the most dangerous anti-patterns
|
||||
of this nature::
|
||||
|
||||
void convert_data(int *source_buf, size_t source_size,
|
||||
int *target_buf, size_t target_size) {
|
||||
// Terrible: mismatched pointer / size.
|
||||
std::span<int> target_span(target_buf, source_size);
|
||||
// ...
|
||||
}
|
||||
|
||||
The second parameter of ``std::span`` should never be the **desired** size
|
||||
of the buffer. It should always be the **actual** size of the buffer.
|
||||
Such code often indicates that the original code has already contained
|
||||
a vulnerability -- and the use of a safe view class failed to prevent it.
|
||||
|
||||
If ``target_span`` actually needs to be of size ``source_size``, a significantly
|
||||
safer way to produce such a span would be to build it with the correct size
|
||||
first, and then resize it to the desired size by calling ``.first()``::
|
||||
|
||||
void convert_data(int *source_buf, size_t source_size,
|
||||
int *target_buf, size_t target_size) {
|
||||
// Safer.
|
||||
std::span<int> target_span(target_buf, target_size).first(source_size);
|
||||
// ...
|
||||
}
|
||||
|
||||
However, these are still half-measures. This code still accepts the
|
||||
bounds information from the caller in an **informal** manner, and such bounds
|
||||
information cannot be guaranteed to be correct.
|
||||
|
||||
In order to mitigate problems of this nature in their entirety,
|
||||
the third guideline is imposed.
|
||||
|
||||
|
||||
Encapsulation of bounds information must be respected continuously
|
||||
------------------------------------------------------------------
|
||||
|
||||
The allocation site of the object is the only reliable source of bounds
|
||||
information for that object. For objects with long lifespans across
|
||||
multiple functions or even libraries in the software stack, it is essential
|
||||
to formally preserve the original bounds information as it's being passed
|
||||
from one piece of code to another.
|
||||
|
||||
Standard container and view classes are designed to preserve bounds information
|
||||
correctly **by construction**. However, they offer a number of ways to "break"
|
||||
encapsulation, which may cause you to temporarily lose track of the correct
|
||||
bounds information:
|
||||
|
||||
- The two-parameter constructor ``std::span(ptr, size)`` allows you to
|
||||
assemble an ill-formed ``std::span``;
|
||||
- Conversely, you can unwrap a container or a view object into a raw pointer
|
||||
and a raw size by calling its ``.data()`` and ``.size()`` methods.
|
||||
- The overloaded ``operator&()`` found on container and iterator classes
|
||||
acts similarly to ``.data()`` in this regard; operations such as
|
||||
``&span[0]`` and ``&*span.begin()`` are effectively unsafe.
|
||||
|
||||
Additional ``-Wunsafe-buffer-usage`` warnings are emitted when encapsulation
|
||||
of **standard** containers is broken in this manner. If you're using
|
||||
non-standard containers, you can achieve a similar effect with facilities
|
||||
described in the next section: :ref:`customization`.
|
||||
|
||||
For example, our previous attempt to address the warning in
|
||||
``get_last_element()`` has actually introduced a new warning along the way,
|
||||
that notifies you about the potentially incorrect bounds information
|
||||
passed into the two-parameter constructor of ``std::span``::
|
||||
|
||||
int get_last_element(int *pointer, size_t size) {
|
||||
std::span<int> sp(pointer, size); // warning: unsafe constructor
|
||||
return sp[size - 1];
|
||||
}
|
||||
|
||||
In order to address this warning, you need to make the function receive
|
||||
the bounds information from the allocation site in a formal manner.
|
||||
The function doesn't necessarily need to know where the allocation site is;
|
||||
it simply needs to be able to accept bounds information **when** it's available.
|
||||
You can achieve this by refactoring the function to accept a ``std::span``
|
||||
as a parameter::
|
||||
|
||||
int get_last_element(std::span<int> sp) {
|
||||
return sp[size - 1];
|
||||
}
|
||||
|
||||
This solution puts the responsibility for making sure the span is well-formed
|
||||
on the **caller**. They should do the same, so that eventually the
|
||||
responsibility is placed on the allocation site!
|
||||
|
||||
Such definition is also very ergonomic as it naturally accepts arbitrary
|
||||
standard containers without any additional code at the call site::
|
||||
|
||||
void use_last_element() {
|
||||
std::vector<int> vec { 1, 2, 3 };
|
||||
int x = get_last_element(vec); // x = 3
|
||||
}
|
||||
|
||||
Such code is naturally bounds-safe because bounds-information is passed down
|
||||
from the allocation site to the buffer access site. Only safe operations
|
||||
are performed on container types. The containers are never "unforged" into
|
||||
raw pointer-size pairs and never "reforged" again. This is what ideal
|
||||
bounds-safe C++ code looks like.
|
||||
|
||||
|
||||
.. _customization:
|
||||
|
||||
Backwards Compatibility, Interoperation with Unsafe Code, Customization
|
||||
=======================================================================
|
||||
|
||||
Some of the code changes described above can be somewhat intrusive.
|
||||
For example, changing a function that previously accepted a pointer and a size
|
||||
separately, to accept a ``std::span`` instead, may require you to update
|
||||
every call site of the function. This is often undesirable and sometimes
|
||||
completely unacceptable when backwards compatibility is required.
|
||||
|
||||
In order to facilitate **incremental adoption** of the coding convention
|
||||
described above, as well as to handle various unusual situations, the compiler
|
||||
provides two additional facilities to give the user more control over
|
||||
``-Wunsafe-buffer-usage`` diagnostics:
|
||||
|
||||
- ``#pragma clang unsafe_buffer_usage`` to mark code as unsafe and **suppress**
|
||||
``-Wunsafe-buffer-usage`` warnings in that code.
|
||||
- ``[[clang::unsafe_buffer_usage]]`` to annotate potential sources of
|
||||
discontinuity of bounds information -- thus introducing
|
||||
**additional** ``-Wunsafe-buffer-usage`` warnings.
|
||||
|
||||
In this section we describe these facilities in detail and show how they can
|
||||
help you with various unusual situations.
|
||||
|
||||
Suppress unwanted warnings with ``#pragma clang unsafe_buffer_usage``
|
||||
---------------------------------------------------------------------
|
||||
|
||||
If you really need to write unsafe code, you can always suppress all
|
||||
``-Wunsafe-buffer-usage`` warnings in a section of code by surrounding
|
||||
that code with the ``unsafe_buffer_usage`` pragma. For example, if you don't
|
||||
want to address the warning in our example function ``get_last_element()``,
|
||||
here is how you can suppress it::
|
||||
|
||||
int get_last_element(int *pointer, size_t size) {
|
||||
#pragma clang unsafe_buffer_usage begin
|
||||
return ptr[sz - 1]; // warning suppressed
|
||||
#pragma clang unsafe_buffer_usage end
|
||||
}
|
||||
|
||||
This behavior is analogous to ``#pragma clang diagnostic`` (`documentation
|
||||
<https://clang.llvm.org/docs/UsersManual.html#controlling-diagnostics-via-pragmas>`_)
|
||||
However, ``#pragma clang unsafe_buffer_usage`` is specialized and recommended
|
||||
over ``#pragma clang diagnostic`` for a number of technical and non-technical
|
||||
reasons. Most importantly, ``#pragma clang unsafe_buffer_usage`` is more
|
||||
suitable for security audits because it is significantly simpler and
|
||||
describes unsafe code in a more formal manner. On the contrary,
|
||||
``#pragma clang diagnostic`` comes with a push/pop syntax (as opposed to
|
||||
the begin/end syntax) and it offers ways to suppress warnings without
|
||||
mentioning them by name (such as ``-Weverything``), which can make it
|
||||
difficult to determine at a glance whether the warning is suppressed
|
||||
on any given line of code.
|
||||
|
||||
There are a few natural reasons to use this pragma:
|
||||
|
||||
- In implementations of safe custom containers. You need this because ultimately
|
||||
``-Wunsafe-buffer-usage`` cannot help you verify that your custom container
|
||||
is safe. It will naturally remind you to audit your container's implementation
|
||||
to make sure it has all the necessary runtime checks, but ultimately you'll
|
||||
need to suppress it once the audit is complete.
|
||||
- In performance-critical code where bounds-safety-related runtime checks
|
||||
cause an unacceptable performance regression. The compiler can theoretically
|
||||
optimize them away (eg. replace a repeated bounds check in a loop with
|
||||
a single check before the loop) but it is not guaranteed to do that.
|
||||
- For incremental adoption purposes. If you want to adopt the coding convention
|
||||
gradually, you can always surround an entire file with the
|
||||
``unsafe_buffer_usage`` pragma and then "make holes" in it whenever
|
||||
you address warnings on specific portions of the code.
|
||||
- In the code that interoperates with unsafe code. This may be code that
|
||||
will never follow the programming model (such as plain C code that will
|
||||
never be converted to C++) or with the code that simply haven't been converted
|
||||
yet.
|
||||
|
||||
Interoperation with unsafe code may require a lot of suppressions.
|
||||
You are encouraged to introduce "unsafe wrapper functions" for various unsafe
|
||||
operations that you need to perform regularly.
|
||||
|
||||
For example, if you regularly receive pointer/size pairs from unsafe code,
|
||||
you may want to introduce a wrapper function for the unsafe span constructor::
|
||||
|
||||
#pragma clang unsafe_buffer_usage begin
|
||||
|
||||
template <typename T>
|
||||
std::span<T> unsafe_forge_span(T *pointer, size_t size) {
|
||||
return std::span(pointer, size);
|
||||
}
|
||||
|
||||
#pragma clang unsafe_buffer_usage end
|
||||
|
||||
Such wrapper function can be used to suppress warnings about unsafe span
|
||||
constructor usage in a more ergonomic manner::
|
||||
|
||||
void use_unsafe_c_struct(unsafe_c_struct *s) {
|
||||
// No warning here.
|
||||
std::span<int> sp = unsafe_forge_span(s->pointer, s->size);
|
||||
// ...
|
||||
}
|
||||
|
||||
The code remains unsafe but it also continues to be nicely readable, and it
|
||||
proves that ``-Wunsafe-buffer-usage`` has done it best to notify you about
|
||||
the potential unsafety. A security auditor will need to keep an eye on such
|
||||
unsafe wrappers. **It is still up to you to confirm that the bounds information
|
||||
passed into the wrapper is correct.**
|
||||
|
||||
|
||||
Flag bounds information discontinuities with ``[[clang::unsafe_buffer_usage]]``
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
The clang attribute ``[[clang::unsafe_buffer_usage]]``
|
||||
(`attribute documentation
|
||||
<https://clang.llvm.org/docs/AttributeReference.html#unsafe-buffer-usage>`_)
|
||||
allows the user to annotate various objects, such as functions or member
|
||||
variables, as incompatible with the Safe Buffers programming model.
|
||||
You are encouraged to do that for arbitrary reasons, but typically the main
|
||||
reason to do that is when an unsafe function needs to be provided for
|
||||
backwards compatibility.
|
||||
|
||||
For example, in the previous section we've seen how the example function
|
||||
``get_last_element()`` needed to have its parameter types changed in order
|
||||
to preserve the continuity of bounds information when receiving a buffer pointer
|
||||
from the caller. However, such a change breaks both API and ABI compatibility.
|
||||
The code that previously used this function will no longer compile, nor link,
|
||||
until every call site of that function is updated. You can reclaim the
|
||||
backwards compatibility -- in terms of both API and ABI -- by adding
|
||||
a "compatibility overload"::
|
||||
|
||||
int get_last_element(std::span<int> sp) {
|
||||
return sp[size - 1];
|
||||
}
|
||||
|
||||
[[clang::unsafe_buffer_usage]] // Please use the new function.
|
||||
int get_last_element(int *pointer, size_t size) {
|
||||
// Avoid code duplication - simply invoke the safe function!
|
||||
// The pragma suppresses the unsafe constructor warning.
|
||||
#pragma clang unsafe_buffer_usage begin
|
||||
return get_last_element(std::span(pointer, size));
|
||||
#pragma clang unsafe_buffer_usage end
|
||||
}
|
||||
|
||||
|
||||
Such an overload allows the surrounding code to continue to work.
|
||||
It is both source-compatible and binary-compatible. It is also strictly safer
|
||||
than the original function because the unsafe buffer access through raw pointer
|
||||
is replaced with a safe ``std::span`` access no matter how it's called. However,
|
||||
because it requires the caller to pass the pointer and the size separately,
|
||||
it violates our "bounds information continuity" principle. This means that
|
||||
the callers who care about bounds safety needs to be encouraged to use the
|
||||
``std::span``-based overload instead. Luckily, the attribute
|
||||
``[[clang::unsafe_buffer_usage]]`` causes a ``-Wunsafe-buffer-usage`` warning
|
||||
to be displayed at every call site of the compatibility overload in order to
|
||||
remind the callers to update their code::
|
||||
|
||||
void use_last_element() {
|
||||
std::vector<int> vec { 1, 2, 3 };
|
||||
|
||||
// no warning
|
||||
int x = get_last_element(vec);
|
||||
|
||||
// warning: this overload introduces unsafe buffer manipulation
|
||||
int x = get_last_element(vec.data(), vec.size());
|
||||
}
|
||||
|
||||
The compatibility overload can be further simplified with the help of the
|
||||
``unsafe_forge_span()`` wrapper as described in the previous section --
|
||||
and it even makes the pragmas unnecessary::
|
||||
|
||||
[[clang::unsafe_buffer_usage]] // Please use the new function.
|
||||
int get_last_element(int *pointer, size_t size) {
|
||||
// Avoid code duplication - simply invoke the safe function!
|
||||
return get_last_element(unsafe_forge_span(pointer, size));
|
||||
}
|
||||
|
||||
Notice how the attribute ``[[clang::unsafe_buffer_usage]]`` does **not**
|
||||
suppress the warnings within the function on its own. Similarly, functions whose
|
||||
entire definitions are covered by ``#pragma clang unsafe_buffer_usage`` do
|
||||
**not** become automatically annotated with the attribute
|
||||
``[[clang::unsafe_buffer_usage]]``. They serve two different purposes:
|
||||
|
||||
- The pragma says that the function isn't safely **written**;
|
||||
- The attribute says that the function isn't safe to **use**.
|
||||
|
||||
Also notice how we've made an **unsafe** wrapper for a **safe** function.
|
||||
This is significantly better than making a **safe** wrapper for an **unsafe**
|
||||
function. In other words, the following solution is significantly more unsafe
|
||||
and undesirable than the previous solution::
|
||||
|
||||
int get_last_element(std::span<int> sp) {
|
||||
// You've just added that attribute, and now you need to
|
||||
// immediately suppress the warning that comes with it?
|
||||
#pragma clang unsafe_buffer_usage begin
|
||||
return get_last_element(sp.data(), sp.size());
|
||||
#pragma clang unsafe_buffer_usage end
|
||||
}
|
||||
|
||||
|
||||
[[clang::unsafe_buffer_usage]]
|
||||
int get_last_element(int *pointer, size_t size) {
|
||||
// This access is still completely unchecked. What's the point of having
|
||||
// perfect bounds information if you aren't performing runtime checks?
|
||||
#pragma clang unsafe_buffer_usage begin
|
||||
return ptr[sz - 1];
|
||||
#pragma clang unsafe_buffer_usage end
|
||||
}
|
||||
|
||||
**Structs and classes**, unlike functions, cannot be overloaded. If a struct
|
||||
contains an unsafe buffer (in the form of a nested array or a pointer/size pair)
|
||||
then it is typically impossible to replace them with a safe container (such as
|
||||
``std::array`` or ``std::span`` respectively) without breaking the layout
|
||||
of the struct and introducing both source and binary incompatibilities with
|
||||
the surrounding client code.
|
||||
|
||||
Additionally, member variables of a class cannot be naturally "hidden" from
|
||||
client code. If a class needs to be used by clients who haven't updated to
|
||||
C++20 yet, you cannot use the C++20-specific ``std::span`` as a member variable
|
||||
type. If the definition of a struct is shared with plain C code that manipulates
|
||||
member variables directly, you cannot use any C++-specific types for these
|
||||
member variables.
|
||||
|
||||
In such cases there's usually no backwards-compatible way to use safe types
|
||||
directly. The best option is usually to discourage the clients from using
|
||||
member variables directly by annotating the member variables with the attribute
|
||||
``[[clang::unsafe_buffer_usage]]``, and then to change the interface
|
||||
of the class to provide safe "accessors" to the unsafe data.
|
||||
|
||||
For example, let's assume the worst-case scenario: ``struct foo`` is an unsafe
|
||||
struct type fully defined in a header shared between plain C code and C++ code::
|
||||
|
||||
struct foo {
|
||||
int *pointer;
|
||||
size_t size;
|
||||
};
|
||||
|
||||
In this case you can achieve safety in C++ code by annotating the member
|
||||
variables as unsafe and encapsulating them into safe accessor methods::
|
||||
|
||||
struct foo {
|
||||
[[clang::unsafe_buffer_usage]]
|
||||
int *pointer;
|
||||
[[clang::unsafe_buffer_usage]]
|
||||
size_t size;
|
||||
|
||||
// Avoid showing this code to clients who are unable to digest it.
|
||||
#if __cplusplus >= 202002L
|
||||
std::span<int> get_pointer_as_span() {
|
||||
#pragma clang unsafe_buffer_usage begin
|
||||
return std::span(pointer, size);
|
||||
#pragma clang unsafe_buffer_usage end
|
||||
}
|
||||
|
||||
void set_pointer_from_span(std::span<int> sp) {
|
||||
#pragma clang unsafe_buffer_usage begin
|
||||
pointer = sp.data();
|
||||
size = sp.size();
|
||||
#pragma clang unsafe_buffer_usage end
|
||||
}
|
||||
|
||||
// Potentially more utility functions.
|
||||
#endif
|
||||
};
|
||||
|
||||
Future Work
|
||||
===========
|
||||
|
||||
The ``-Wunsafe-buffer-usage`` technology is in active development. The warning
|
||||
is largely ready for everyday use but it is continuously improved to reduce
|
||||
unnecessary noise as well as cover some of the trickier unsafe operations.
|
||||
|
||||
Fix-It Hints for ``-Wunsafe-buffer-usage``
|
||||
------------------------------------------
|
||||
|
||||
A code transformation tool is in development that can semi-automatically
|
||||
transform large bodies of code to follow the C++ Safe Buffers programming model.
|
||||
It can currently be accessed by passing the experimental flag
|
||||
``-fsafe-buffer-usage-suggestions`` in addition to ``-Wunsafe-buffer-usage``.
|
||||
|
||||
Fixits produced this way currently assume the default approach described
|
||||
in this document as they suggest standard containers and views (most notably
|
||||
``std::span`` and ``std::array``) as replacements for raw buffer pointers.
|
||||
This also additionally requires libc++ hardening in order to make the runtime
|
||||
bounds checks actually happen.
|
||||
|
||||
Static Analysis to Identify Suspicious Sources of Bounds Information
|
||||
--------------------------------------------------------------------
|
||||
|
||||
The unsafe constructor ``span(pointer, size)`` is often a necessary evil
|
||||
when it comes to interoperation with unsafe code. However, passing the
|
||||
correct bounds information to such constructor is often difficult.
|
||||
In order to detect those ``span(target_pointer, source_size)`` anti-patterns,
|
||||
path-sensitive analysis performed by `the clang static analyzer
|
||||
<https://clang-analyzer.llvm.org>`_ can be taught to identify situations
|
||||
when the pointer and the size are coming from "suspiciously different" sources.
|
||||
|
||||
Such analysis will be able to identify the source of information with
|
||||
significantly higher precision than that of the compiler, making it much better
|
||||
at identifying incorrect bounds information in your code while producing
|
||||
significantly fewer warnings. It will also need to bypass
|
||||
``#pragma clang unsafe_buffer_usage`` suppressions and "see through"
|
||||
unsafe wrappers such as ``unsafe_forge_span`` -- something that
|
||||
the static analyzer is naturally capable of doing.
|
@ -25,6 +25,7 @@ Using Clang as a Compiler
|
||||
CrossCompilation
|
||||
ClangStaticAnalyzer
|
||||
ThreadSafetyAnalysis
|
||||
SafeBuffers
|
||||
DataFlowAnalysisIntro
|
||||
AddressSanitizer
|
||||
ThreadSanitizer
|
||||
|
Loading…
x
Reference in New Issue
Block a user