11 Commits

Author SHA1 Message Date
Nikolas Klauser
e99c4906e4
[libc++] Granularize <cstddef> includes (#108696) 2024-10-31 02:20:10 +01:00
Louis Dionne
348e74139a [libc++][NFC] Run clang-format on libcxx/include
This re-formats a few headers that had become out-of-sync with respect
to formatting since we ran clang-format on the whole codebase. There's
surprisingly few instances of it.
2024-08-30 12:09:36 -04:00
Eisuke Kawashima
88184e5060
[libc++] Fix invalid escape sequences in Python comments (#94032) 2024-06-10 09:38:31 -04:00
AngryLoki
ae858b5123
[libc++] Fix SyntaxWarning messages from python 3.12 (#93637)
This fixes "SyntaxWarning: invalid escape sequence" and "SyntaxWarning: `is` with int literal".
transitive_includes.gen.py was also reformatted with darker per the style guide.

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
2024-06-05 10:31:03 -04:00
Mark de Wever
e3dea5e341
[libc++][format] Improves escaping performance. (#88533)
The previous patch implemented
- P2713R1 Escaping improvements in std::format
- LWG3965 Incorrect example in [format.string.escaped] p3 for formatting
of combining characters

These changes were correct, but had a size and performance penalty. This
patch improves the size and performance of the previous patch. The
performance is still worse than before since the lookups may require two
property lookups instead of one before implementing the paper. The
changes give a tighter coupling between the Unicode data and the
algorithm. Additional tests are added to notify about changes in future
Unicode updates.

Before
```
-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char>           110704 ns       110696 ns         6206
BM_unicode_escaped<char>         101371 ns       101374 ns         6862
BM_cyrillic_escaped<char>         63329 ns        63327 ns        11013
BM_japanese_escaped<char>         41223 ns        41225 ns        16938
BM_emoji_escaped<char>           111022 ns       111021 ns         6304
BM_ascii_escaped<wchar_t>        112441 ns       112443 ns         6231
BM_unicode_escaped<wchar_t>      102776 ns       102779 ns         6813
BM_cyrillic_escaped<wchar_t>      58977 ns        58975 ns        11868
BM_japanese_escaped<wchar_t>      36885 ns        36886 ns        18975
BM_emoji_escaped<wchar_t>        115885 ns       115881 ns         6051
```

The first change is to manually encode the entire last area and make a
manual exception for the 240 excluded entries. This reduced the table
from 1077 to 729 entries and gave the following benchmark results.
```
-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char>           104777 ns       104776 ns         6550
BM_unicode_escaped<char>          96980 ns        96982 ns         7238
BM_cyrillic_escaped<char>         60254 ns        60251 ns        11670
BM_japanese_escaped<char>         44452 ns        44452 ns        15734
BM_emoji_escaped<char>           104557 ns       104551 ns         6685
BM_ascii_escaped<wchar_t>        107456 ns       107454 ns         6505
BM_unicode_escaped<wchar_t>       96219 ns        96216 ns         7301
BM_cyrillic_escaped<wchar_t>      56921 ns        56904 ns        12288
BM_japanese_escaped<wchar_t>      39530 ns        39529 ns        17492
BM_emoji_escaped<wchar_t>        108494 ns       108496 ns         6408
```

An entry in the table can only contain 2048 code points. For larger
ranges there are multiple entries split in chunks with a maximum size of
2048 entries. To encode the entire Unicode code point range 21 bits are
required. The manual part starts at 0x323B0 this means all entries in
the table fit in 18 bits. This allows to allocate 3 additional bits for
the range. This allows entries to have 16384 elements. This range always
avoids splitting the range in multiple chunks.

This reduces the number of table elements from 729 to 711 and gives the
following benchmark results.
```
-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char>           104289 ns       104289 ns         6619
BM_unicode_escaped<char>          96682 ns        96681 ns         7215
BM_cyrillic_escaped<char>         59673 ns        59673 ns        11732
BM_japanese_escaped<char>         41983 ns        41982 ns        16646
BM_emoji_escaped<char>           104119 ns       104120 ns         6683
BM_ascii_escaped<wchar_t>        104503 ns       104505 ns         6693
BM_unicode_escaped<wchar_t>       93426 ns        93423 ns         7489
BM_cyrillic_escaped<wchar_t>      54858 ns        54859 ns        12742
BM_japanese_escaped<wchar_t>      36385 ns        36384 ns        19259
BM_emoji_escaped<wchar_t>        105608 ns       105610 ns         6592
```
2024-04-28 12:15:25 +02:00
Mark de Wever
ad76a85954
[libc++][format] Improves escaping. (#88283)
The change increments the size of the lookup table considerably. The
table has an "upper boundary" check. The removal of the code units with
the property Grapheme_Extend=Yes removes the range E0100..E01EF. This
breaks the trailing large continuous section in two parts. This will be
improved in a followup patch.

Implements:
- P2713R1 Escaping improvements in std::format
- LWG3965 Incorrect example in [format.string.escaped] p3 for formatting
of combining characters

```
---------------------------------------------------------
Benchmark                           Before          After    
---------------------------------------------------------
BM_ascii_escaped<char>            95696 ns      110704 ns
BM_unicode_escaped<char>          89311 ns      101371 ns
BM_cyrillic_escaped<char>         58633 ns       63329 ns
BM_japanese_escaped<char>         44500 ns       41223 ns
BM_emoji_escaped<char>            99156 ns      111022 ns
BM_ascii_escaped<wchar_t>         92245 ns      112441 ns
BM_unicode_escaped<wchar_t>       80970 ns      102776 ns
BM_cyrillic_escaped<wchar_t>      51253 ns       58977 ns
BM_japanese_escaped<wchar_t>      37252 ns       36885 ns
BM_emoji_escaped<wchar_t>         96226 ns      115885 ns
```
2024-04-25 17:16:41 +02:00
Mark de Wever
d179176f3e
[libc++][format] Adds ABI tags to inline constexpr variables. (#86293)
This uses the macro on record types and inline constexpr variables. The
tagged declarations are very likely to change in future versions of
libc++:
- __fields are internal types used to control the formatter's parse
functions which fields to expect. Newer formatters may add new fields.
For example the filesystem::path formatter accepted in the recent Tokyo
meeting added a new 'g' flag, which differs from the 'g' type.
- The Unicode tables. The number of entries in these table likely differ
between Unicode versions. The tables contain only a part of all Unicode
properties. Typically they are stored in a 32-bit entry where some bits
contain the properties and other bits the size of the range. Changes in
the Unicode or C++ algorithms may require more properties to be
available in C++. This may affect the number of bits available in the
range. If needed, other declarations get the macro. This is mainly a
first time to review this approach.

This was originally https://reviews.llvm.org/D143494 where a new macro
_LIBCPP_HIDE_FROM_ABI_TYPE was defined. Testing revealed the existing
macro _LIBCPP_HIDE_FROM_ABI could be used. The "parts" of the macro that
do not affect records are not harmful. Based on this information the
existing macro was used and additional documentation was written.
2024-03-25 18:33:30 +01:00
Louis Dionne
b18a46e35d
[libc++][NFC] Add a few clang-format annotations (#74352)
This is in preparation for clang-formatting the whole code base. These
annotations are required either to avoid clang-format bugs or because
the manually formatted code is significantly more readable than the
clang-formatted alternative. All in all, it seems like very few
annotations are required, which means that clang-format is doing a very
good job in most cases.
2023-12-04 15:17:31 -05:00
Stephan T. Lavavej
0d3c40b82b
[libc++] Remove unused Python imports (#73724)
VSCode's Pylance extension informed me, and text searching confirmed,
that these imports are unused. I believe we should be able to remove
them harmlessly.
2023-11-29 09:25:06 -05:00
Nikolas Klauser
4f15267d3d [libc++][NFC] Replace _LIBCPP_STD_VER > x with _LIBCPP_STD_VER >= x
This change is almost fully mechanical. The only interesting change is in `generate_feature_test_macro_components.py` to generate `_LIBCPP_STD_VER >=` instead. To avoid churn in the git-blame this commit should be added to the `.git-blame-ignore-revs` once committed.

Reviewed By: ldionne, var-const, #libc

Spies: jloser, libcxx-commits, arichardson, arphaman, wenlei

Differential Revision: https://reviews.llvm.org/D143962
2023-02-15 16:52:25 +01:00
Mark de Wever
a48007355a [libc++][format] Implements string escaping.
Implements parts of
- P2286R8 Formatting Ranges

Reviewed By: #libc, tahonermann

Differential Revision: https://reviews.llvm.org/D134036
2022-10-20 17:29:34 +02:00