llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-27 00:56:06 +00:00

Author	SHA1	Message	Date
Louis Dionne	b3a4bf9d8f	[libc++] Refactor and add benchmarks from [alg.nonmodifying] (#128206 )	2025-03-19 00:08:46 -04:00
Alexander Kornienko	d1156fcb56	Revert "[libc++] Optimize num_put integral functions" (#131613 ) Reverts llvm/llvm-project#120859 This change breaks formatting of `0` with `std::showbase` + `std::hex` or `std::oct`, as well as `+0` with `std::showpos`. I believe the new behavior is violating the standard. See https://github.com/llvm/llvm-project/pull/120859#issuecomment-2723970242 and later comments for details and explanation.	2025-03-18 02:05:51 +01:00
Louis Dionne	24e88b0e6b	[libc++] Add remaining benchmarks from [alg.modifying.operations] (#127354 ) This patch adds benchmarks for all the remaining algorithms in [alg.modifying.operations] that we didn't already have a benchmark for.	2025-03-17 15:11:13 -04:00
Peng Liu	4baf1c03fa	[libc++] Optimize ranges::rotate for vector<bool>::iterator (#121168 ) This PR optimizes the performance of `std::ranges::rotate` for `vector<bool>::iterator`. The optimization yields a performance improvement of up to 2096x. Closes #64038.	2025-03-13 14:07:23 -04:00
Nikolas Klauser	15edf8725a	[libc++] Optimize num_put integral functions (#120859 ) ``` ------------------------------------------------------- Benchmark old new ------------------------------------------------------- BM_num_put<bool> 76.2 ns 32.0 ns BM_num_put<long> 76.9 ns 33.1 ns BM_num_put<long long> 77.9 ns 34.2 ns BM_num_put<unsigned long> 78.4 ns 33.1 ns BM_num_put<unsigned long long> 78.0 ns 34.4 ns BM_num_put<double> 224 ns 228 ns BM_num_put<long double> 239 ns 230 ns BM_num_put<const void*> 68.7 ns 35.1 ns ``` Fixes #40109.	2025-03-05 14:18:49 +01:00
Peng Liu	a12744ff05	[libc++] Optimize ranges::swap_ranges for vector<bool>::iterator (#121150 ) This PR optimizes the performance of `std::ranges::swap_ranges` for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations yield performance improvements of up to 611x for aligned range swap and 78x for unaligned range swap comparison. Additionally, comprehensive tests covering up to 4 storage words (256 bytes) with odd and even bit sizes are provided, which validate the proposed optimizations in this patch.	2025-03-04 17:15:36 -05:00
Peng Liu	7717a549e9	[libc++] Optimize ranges::equal for vector<bool>::iterator (#121084 ) This PR optimizes the performance of `std::ranges::equal` for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations yield performance improvements of up to 188x for aligned equality comparison and 82x for unaligned equality comparison. Moreover, comprehensive tests covering up to 4 storage words (256 bytes) with odd and even bit sizes are provided, which validate the proposed optimizations in this patch.	2025-02-26 12:18:25 -05:00
Louis Dionne	8feb5bac32	[libc++] Add benchmarks for copy algorithms (#127328 ) This patch adds benchmarks for the copy family of algorithms (copy, copy_n, copy_if, copy_backward).	2025-02-20 08:24:32 -05:00
Peng Liu	ab3d793982	[libc++] Optimize ranges::move{,_backward} for vector<bool>::iterator (#121109 ) As a follow-up to #121013 (which optimized `ranges::copy`) and #121026 (which optimized `ranges::copy_backward`), this PR enhances the performance of `std::ranges::{move, move_backward}` for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations bring performance improvements analogous to those achieved for the `{copy, copy_backward}` algorithms: up to 2000x for aligned moves and 60x for unaligned moves. Moreover, comprehensive tests covering up to 4 storage words (256 bytes) with odd and even bit sizes are provided, which validate the proposed optimizations in this patch.	2025-02-19 11:36:45 -05:00
Louis Dionne	a6093d3034	[libc++] Explicitly mention vector_bool in the name of benchmarks (#127313 ) We have some benchmarks that were benchmarking very specific functionality, namely the optimizations in vector<bool>::iterator. Call this out in the benchmarks by renaming them appropriately. In the future we will also increase the coverage of these benchmarks to test other containers.	2025-02-15 14:56:19 +01:00
Louis Dionne	27598aba49	[libc++] Further refactor sequence container benchmarks (#126129 ) This patch does not significantly change how the sequence container benchmarks are done, but it adopts the same style as the associative container benchmarks. This commit does adjust how we were benchmarking push_back, where we never really measured the overhead of the slow path of push_back (when we need to reallocate).	2025-02-07 09:56:45 -05:00
Louis Dionne	1d319dfe7d	[libc++] Implement generic associative container benchmarks (#123663 ) This patch implements generic associative container benchmarks for containers with unique keys. In doing so, it replaces the existing std::map benchmarks which were based on the cartesian product infrastructure and were too slow to execute. These new benchmarks aim to strike a balance between exhaustive coverage of all operations in the most interesting case, while executing fairly rapidly (~40s on my machine). This bumps the requirement for the map benchmarks from C++17 to C++20 because the common header that provides associative container benchmarks requires support for C++20 concepts.	2025-02-06 16:08:55 -05:00
Louis Dionne	c7c7eabc7f	[libc++] Add a benchmark for std::reverse (#125262 )	2025-02-03 12:10:47 -05:00
Louis Dionne	439bef9751	[libc++] Refactor the sequence container benchmarks (#119763 ) Rewrite the sequence container benchmarks to only rely on the actual operations specified in SequenceContainer requirements and add benchmarks for std::list, which is also considered a sequence container. One of the major goals of this refactoring is also to make these container benchmarks run faster so that they can be run more frequently. The existing benchmarks have the significant problem that they take so long to run that they must basically be run overnight. This patch reduces the size of inputs such that the rewritten benchmarks each take at most a minute to run. This patch doesn't touch the string benchmarks, which were not using the generic container benchmark functions previously.	2025-01-30 15:02:34 -05:00
Peng Liu	edc3dc6abd	[libc++] Optimize ranges::copy_backward for vector<bool>::iterator (#121026 ) As a follow-up to #121013 (which focused on `std::ranges::copy`), this PR optimizes the performance of `std::ranges::copy_backward` for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations yield performance improvements of up to 2000x for aligned copies and 60x for unaligned copies.	2025-01-30 14:55:05 -05:00
Peng Liu	5b65896ad6	[libc++] Optimize ranges::copy{, _n} for vector<bool>::iterator (#121013 ) This PR optimizes the performance of `std::ranges::copy` and `std::ranges::copy_n` specifically for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations yield performance improvements of up to 2000x for aligned copies and 60x for unaligned copies. Additionally, new tests have been added to validate these enhancements. - Aligned source-destination bits ranges::copy ``` -------------------------------------------------------------------------- Benchmark Before After Improvement -------------------------------------------------------------------------- bm_ranges_copy_vb_aligned/8 10.8 ns 1.42 ns 8x bm_ranges_copy_vb_aligned/64 88.5 ns 2.28 ns 39x bm_ranges_copy_vb_aligned/512 709 ns 1.95 ns 364x bm_ranges_copy_vb_aligned/4096 5568 ns 5.01 ns 1111x bm_ranges_copy_vb_aligned/32768 44754 ns 38.7 ns 1156x bm_ranges_copy_vb_aligned/65536 91092 ns 73.2 ns 1244x bm_ranges_copy_vb_aligned/102400 139473 ns 127 ns 1098x bm_ranges_copy_vb_aligned/106496 189004 ns 81.5 ns 2319x bm_ranges_copy_vb_aligned/110592 153647 ns 71.1 ns 2161x bm_ranges_copy_vb_aligned/114688 159261 ns 70.2 ns 2269x bm_ranges_copy_vb_aligned/118784 181910 ns 73.5 ns 2475x bm_ranges_copy_vb_aligned/122880 174117 ns 76.5 ns 2276x bm_ranges_copy_vb_aligned/126976 176020 ns 82.0 ns 2147x bm_ranges_copy_vb_aligned/131072 180757 ns 137 ns 1319x bm_ranges_copy_vb_aligned/135168 190342 ns 158 ns 1205x bm_ranges_copy_vb_aligned/139264 192831 ns 103 ns 1872x bm_ranges_copy_vb_aligned/143360 199627 ns 89.4 ns 2233x bm_ranges_copy_vb_aligned/147456 203881 ns 88.6 ns 2301x bm_ranges_copy_vb_aligned/151552 213345 ns 88.4 ns 2413x bm_ranges_copy_vb_aligned/155648 216892 ns 92.9 ns 2335x bm_ranges_copy_vb_aligned/159744 222751 ns 96.4 ns 2311x bm_ranges_copy_vb_aligned/163840 225995 ns 173 ns 1306x bm_ranges_copy_vb_aligned/167936 235230 ns 202 ns 1165x bm_ranges_copy_vb_aligned/172032 244093 ns 131 ns 1863x bm_ranges_copy_vb_aligned/176128 244434 ns 111 ns 2202x bm_ranges_copy_vb_aligned/180224 249570 ns 108 ns 2311x bm_ranges_copy_vb_aligned/184320 254538 ns 108 ns 2357x bm_ranges_copy_vb_aligned/188416 261817 ns 113 ns 2317x bm_ranges_copy_vb_aligned/192512 269923 ns 125 ns 2159x bm_ranges_copy_vb_aligned/196608 273494 ns 210 ns 1302x bm_ranges_copy_vb_aligned/200704 280035 ns 269 ns 1041x bm_ranges_copy_vb_aligned/204800 293102 ns 231 ns 1269x ``` ranges::copy_n ``` -------------------------------------------------------------------------- Benchmark Before After Improvement -------------------------------------------------------------------------- bm_ranges_copy_n_vb_aligned/8 11.8 ns 0.89 ns 13x bm_ranges_copy_n_vb_aligned/64 91.6 ns 2.06 ns 44x bm_ranges_copy_n_vb_aligned/512 718 ns 2.45 ns 293x bm_ranges_copy_n_vb_aligned/4096 5750 ns 5.02 ns 1145x bm_ranges_copy_n_vb_aligned/32768 45824 ns 40.9 ns 1120x bm_ranges_copy_n_vb_aligned/65536 92267 ns 73.8 ns 1250x bm_ranges_copy_n_vb_aligned/102400 143267 ns 125 ns 1146x bm_ranges_copy_n_vb_aligned/106496 148625 ns 82.4 ns 1804x bm_ranges_copy_n_vb_aligned/110592 154817 ns 72.0 ns 2150x bm_ranges_copy_n_vb_aligned/114688 157953 ns 70.4 ns 2244x bm_ranges_copy_n_vb_aligned/118784 162374 ns 71.5 ns 2270x bm_ranges_copy_n_vb_aligned/122880 168638 ns 72.9 ns 2313x bm_ranges_copy_n_vb_aligned/126976 175596 ns 76.6 ns 2292x bm_ranges_copy_n_vb_aligned/131072 181164 ns 135 ns 1342x bm_ranges_copy_n_vb_aligned/135168 184697 ns 157 ns 1176x bm_ranges_copy_n_vb_aligned/139264 191395 ns 104 ns 1840x bm_ranges_copy_n_vb_aligned/143360 194954 ns 88.3 ns 2208x bm_ranges_copy_n_vb_aligned/147456 208917 ns 86.1 ns 2426x bm_ranges_copy_n_vb_aligned/151552 211101 ns 87.2 ns 2421x bm_ranges_copy_n_vb_aligned/155648 213175 ns 89.0 ns 2395x bm_ranges_copy_n_vb_aligned/159744 218988 ns 86.7 ns 2526x bm_ranges_copy_n_vb_aligned/163840 225263 ns 156 ns 1444x bm_ranges_copy_n_vb_aligned/167936 230725 ns 184 ns 1254x bm_ranges_copy_n_vb_aligned/172032 235795 ns 119 ns 1981x bm_ranges_copy_n_vb_aligned/176128 241145 ns 101 ns 2388x bm_ranges_copy_n_vb_aligned/180224 250680 ns 99.5 ns 2519x bm_ranges_copy_n_vb_aligned/184320 262954 ns 99.7 ns 2637x bm_ranges_copy_n_vb_aligned/188416 258584 ns 103 ns 2510x bm_ranges_copy_n_vb_aligned/192512 267190 ns 125 ns 2138x bm_ranges_copy_n_vb_aligned/196608 270821 ns 213 ns 1271x bm_ranges_copy_n_vb_aligned/200704 279532 ns 262 ns 1067x bm_ranges_copy_n_vb_aligned/204800 283412 ns 222 ns 1277x ``` - Unaligned source-destination bits ``` -------------------------------------------------------------------------------- Benchmark Before After Improvement -------------------------------------------------------------------------------- bm_ranges_copy_vb_unaligned/8 12.8 ns 8.59 ns 1.5x bm_ranges_copy_vb_unaligned/64 98.2 ns 8.24 ns 12x bm_ranges_copy_vb_unaligned/512 755 ns 18.1 ns 42x bm_ranges_copy_vb_unaligned/4096 6027 ns 102 ns 59x bm_ranges_copy_vb_unaligned/32768 47663 ns 774 ns 62x bm_ranges_copy_vb_unaligned/262144 378981 ns 6455 ns 59x bm_ranges_copy_vb_unaligned/1048576 1520486 ns 25942 ns 59x bm_ranges_copy_n_vb_unaligned/8 11.3 ns 8.22 ns 1.4x bm_ranges_copy_n_vb_unaligned/64 97.3 ns 7.89 ns 12x bm_ranges_copy_n_vb_unaligned/512 747 ns 18.1 ns 41x bm_ranges_copy_n_vb_unaligned/4096 5932 ns 99.0 ns 60x bm_ranges_copy_n_vb_unaligned/32768 47776 ns 749 ns 64x bm_ranges_copy_n_vb_unaligned/262144 378802 ns 6576 ns 58x bm_ranges_copy_n_vb_unaligned/1048576 1547234 ns 26229 ns 59x ```	2025-01-30 17:26:26 +01:00
Mark de Wever	0cd794d486	[libc++][chrono] implements UTC clock. (#90393 ) While implementing this feature and its associated LWG issues it turns out - LWG3316 Correctly define epoch for utc_clock / utc_timepoint only added non-normative wording to the standard. Implements parts of: - P0355 Extending <chrono> to Calendars and Time Zones - P1361 Integration of chrono with text formatting - LWG3359 <chrono> leap second support should allow for negative leap seconds	2025-01-24 18:56:02 +01:00
Hui	699f196055	[libc++] remove yield from atomic::wait (#120012 ) This is to address the issue where `yield` can cause the thread to be assigned to the lowest priority. I have done lots of experiments: see the comments here: https://github.com/llvm/llvm-project/pull/84471#issuecomment-2522723549 And for this patch, the benchmark has been performed on a 16 core M4 MAX CPU MacBook Pro. dylib compiled with Release mode and the test compiled with optimization=speed ``` Comparing ../../../build_atomic_yield2/ref_new2.json to ../../../build_atomic_yield2/no_yield_new2.json Benchmark Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<0>>/262144 +0.0460 +0.0392 14949926 15637503 13633314 14167327 BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<0>>/524288 +0.0299 +0.0290 24369327 25099004 24367214 25073900 BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<0>>/1048576 +0.0648 +0.0640 48149060 51268517 48144857 51226733 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<50>, NumHighPrioTasks<0>>/4096 +0.0000 -0.8765 204815500 204823427 204514333 25265071 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<50>, NumHighPrioTasks<0>>/8192 +0.0000 -0.8747 409637520 409640821 408997500 51228071 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<50>, NumHighPrioTasks<0>>/16384 +0.0001 -0.8737 819244417 819351256 817022000 103217000 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<100>, NumHighPrioTasks<0>>/4096 +0.0000 -0.9029 409607694 409624937 271866333 26410600 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<100>, NumHighPrioTasks<0>>/8192 +0.0001 -0.9017 819168417 819269339 542784000 53352429 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<100>, NumHighPrioTasks<0>>/16384 +0.0001 -0.9012 1638361750 1638522929 1089486000 107684571 BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<4>>/262144 +0.3178 +0.3068 12777744 16838266 12764732 16681233 BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<4>>/524288 +0.2231 +0.2225 26889415 32887842 26864138 32840550 BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<4>>/1048576 +0.1809 +0.1799 56103004 66251660 56048000 66129583 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<50>, NumHighPrioTasks<4>>/4096 -0.0029 -0.8708 205509986 204906011 204277333 26399538 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<50>, NumHighPrioTasks<4>>/8192 +0.0001 -0.8711 410286709 410314199 408608000 52667692 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<50>, NumHighPrioTasks<4>>/16384 -0.0019 -0.8713 821042916 819476441 816274000 105077000 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<100>, NumHighPrioTasks<4>>/4096 -0.0005 -0.9015 409825792 409638429 273145333 26896400 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<100>, NumHighPrioTasks<4>>/8192 -0.0027 -0.9014 821528125 819285433 545661000 53775308 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<100>, NumHighPrioTasks<4>>/16384 -0.0041 -0.9014 1645204459 1638538077 1091726000 107647000 BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<7>>/16 -0.4835 -0.4836 1609 831 1609 831 BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<7>>/32 -0.4398 -0.4399 3167 1774 3166 1773 BM_1_atomic_1_waiter_1_notifier<KeepNotifying, NumHighPrioTasks<7>>/64 -0.4705 -0.4705 6323 3348 6323 3348 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<50>, NumHighPrioTasks<7>>/8 +0.0005 -0.8683 400109 400314 399256 52575 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<50>, NumHighPrioTasks<7>>/16 +0.0005 -0.8683 800055 800483 798797 105165 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<50>, NumHighPrioTasks<7>>/32 +0.0003 -0.8680 1600058 1600585 1597266 210903 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<100>, NumHighPrioTasks<7>>/8 +0.0004 -0.8976 800006 800365 531802 54441 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<100>, NumHighPrioTasks<7>>/16 +0.0005 -0.8982 1599965 1600765 1064885 108429 BM_1_atomic_1_waiter_1_notifier<NotifyEveryNus<100>, NumHighPrioTasks<7>>/32 +0.0005 -0.8993 3199905 3201437 2129243 214343 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<3>, NumHighPrioTasks<0>>/16384 -0.0226 -0.0261 972539 950519 971198 945828 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<3>, NumHighPrioTasks<0>>/32768 -0.0198 -0.0221 1933294 1895054 1930720 1888094 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<3>, NumHighPrioTasks<0>>/65536 -0.0031 -0.0039 3835138 3823094 3827785 3812836 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<7>, NumHighPrioTasks<0>>/4096 +0.4380 +0.4294 571762 822185 570245 815115 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<7>, NumHighPrioTasks<0>>/8192 +0.0735 +0.0680 1223881 1313880 1221350 1304439 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<7>, NumHighPrioTasks<0>>/16384 +0.1222 +0.1205 2442071 2740519 2433105 2726274 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<15>, NumHighPrioTasks<0>>/1024 +0.1527 +1.2188 196081 226031 62647 139001 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<15>, NumHighPrioTasks<0>>/2048 +0.0757 +0.4838 387858 417228 129250 191780 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<15>, NumHighPrioTasks<0>>/4096 -0.0355 -0.2443 812827 784003 378109 285722 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<3>, NumHighPrioTasks<0>>/1024 +0.0002 -0.0873 51202059 51211089 51135714 46670867 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<3>, NumHighPrioTasks<0>>/2048 +0.0001 -0.0864 102424970 102432359 102287571 93452000 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<3>, NumHighPrioTasks<0>>/4096 +0.0000 -0.0865 204828250 204834229 204528667 186845250 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<7>, NumHighPrioTasks<0>>/256 +0.0003 -0.1681 12801752 12805016 12786382 10636485 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<7>, NumHighPrioTasks<0>>/512 +0.0001 -0.1686 25601940 25604893 25565481 21254515 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<7>, NumHighPrioTasks<0>>/1024 +0.0000 -0.1569 51210789 51211539 51150143 43122500 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<15>, NumHighPrioTasks<0>>/64 +0.0064 -0.3503 3210430 3230869 2856780 1856063 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<15>, NumHighPrioTasks<0>>/128 +0.0034 -0.3534 6410529 6432308 5704792 3688942 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<15>, NumHighPrioTasks<0>>/256 +0.0011 -0.3600 12821419 12835646 11455934 7331250 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<3>, NumHighPrioTasks<0>>/256 +0.0003 +0.0034 25600089 25608062 24375034 24457172 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<3>, NumHighPrioTasks<0>>/512 +0.0002 -0.0000 51203798 51211795 48859857 48858500 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<3>, NumHighPrioTasks<0>>/1024 +0.0003 +0.0008 102411321 102437524 97694429 97777286 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<7>, NumHighPrioTasks<0>>/64 +0.0002 -0.0464 6399846 6401009 6070487 5789091 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<7>, NumHighPrioTasks<0>>/128 +0.0002 -0.0457 12799914 12802544 12069966 11518836 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<7>, NumHighPrioTasks<0>>/256 +0.0001 -0.0513 25599724 25602105 24202862 22962032 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<15>, NumHighPrioTasks<0>>/16 -0.0060 +0.2575 1611779 1602148 956236 1202492 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<15>, NumHighPrioTasks<0>>/32 -0.0064 +0.2964 3221485 3200918 1883540 2441728 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<15>, NumHighPrioTasks<0>>/64 -0.0046 +0.3087 6432692 6403368 3701725 4844611 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<3>, NumHighPrioTasks<4>>/256 -0.0536 -0.0592 27458 25988 27402 25780 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<3>, NumHighPrioTasks<4>>/512 -0.0469 -0.0527 54745 52175 54628 51750 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<3>, NumHighPrioTasks<4>>/1024 -0.0297 -0.0340 108312 105095 108047 104378 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<7>, NumHighPrioTasks<4>>/64 -0.2445 -0.2722 15109 11414 14711 10708 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<7>, NumHighPrioTasks<4>>/128 -0.3132 -0.3515 32494 22317 32063 20794 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<7>, NumHighPrioTasks<4>>/256 -0.1397 -0.1834 52801 45424 52170 42602 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<15>, NumHighPrioTasks<4>>/16 +0.1679 +1.0248 28973 33837 13243 26814 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<15>, NumHighPrioTasks<4>>/32 -0.0481 +0.7901 39155 37273 16072 28771 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<15>, NumHighPrioTasks<4>>/64 -0.2075 +0.7568 57547 45606 19582 34402 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<3>, NumHighPrioTasks<4>>/256 -0.0001 -0.0807 12802693 12800886 12775327 11744119 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<3>, NumHighPrioTasks<4>>/512 -0.0021 -0.0867 25655056 25601315 25590407 23371667 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<3>, NumHighPrioTasks<4>>/1024 -0.0007 -0.0832 51238801 51201975 51099071 46845733 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<7>, NumHighPrioTasks<4>>/64 +0.0016 -0.2411 3200714 3205846 3176841 2410756 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<7>, NumHighPrioTasks<4>>/128 +0.0008 -0.2373 6404239 6409102 6359649 4850544 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<7>, NumHighPrioTasks<4>>/256 +0.0000 -0.2286 12805839 12806032 12713018 9806653 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<15>, NumHighPrioTasks<4>>/16 +0.0272 +0.0563 811198 833264 482220 509345 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<15>, NumHighPrioTasks<4>>/32 +0.0097 +0.0454 1617205 1632962 957801 1001264 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<15>, NumHighPrioTasks<4>>/64 +0.0050 +0.0389 3217997 3234130 1927921 2002868 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<3>, NumHighPrioTasks<4>>/256 +0.0000 -0.0009 25599763 25601039 24520071 24497071 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<3>, NumHighPrioTasks<4>>/512 +0.0001 -0.0017 51200354 51203628 49086786 49005500 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<3>, NumHighPrioTasks<4>>/1024 +0.0001 +0.0013 102400369 102409744 97931143 98060857 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<7>, NumHighPrioTasks<4>>/64 -0.0017 +0.0128 6410821 6400104 5529150 5600008 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<7>, NumHighPrioTasks<4>>/128 -0.0011 +0.0215 12817263 12803569 11025889 11263032 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<7>, NumHighPrioTasks<4>>/256 -0.0005 +0.0193 25612704 25600332 22089065 22515677 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<15>, NumHighPrioTasks<4>>/16 -0.0164 +0.7969 1627422 1600798 665736 1196236 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<15>, NumHighPrioTasks<4>>/32 -0.0095 +0.8362 3231500 3200840 1290017 2368789 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<15>, NumHighPrioTasks<4>>/64 -0.0050 +0.7319 6433401 6401180 2747936 4759115 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<3>, NumHighPrioTasks<7>>/16 +0.0155 +0.0092 1177 1195 1171 1181 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<3>, NumHighPrioTasks<7>>/32 -0.0135 -0.0145 2103 2074 2095 2064 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<3>, NumHighPrioTasks<7>>/64 +0.0022 +0.0009 3832 3841 3820 3823 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<7>, NumHighPrioTasks<7>>/8 +13.9131 +9.5298 2074 30931 2041 21495 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<7>, NumHighPrioTasks<7>>/16 +5.9980 +3.9816 3168 22172 3124 15563 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<7>, NumHighPrioTasks<7>>/32 +3.8681 +2.3515 5412 26348 5321 17833 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<15>, NumHighPrioTasks<7>>/4 +0.1312 +0.4845 31938 36127 12666 18803 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<15>, NumHighPrioTasks<7>>/8 -0.0475 +0.0775 39196 37336 18078 19479 BM_1_atomic_multi_waiter_1_notifier<KeepNotifying, NumWaitingThreads<15>, NumHighPrioTasks<7>>/16 -0.3146 -0.3853 57548 39441 31743 19513 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<3>, NumHighPrioTasks<7>>/8 -0.0012 -0.0916 400610 400149 399248 362679 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<3>, NumHighPrioTasks<7>>/16 -0.0032 -0.0904 802940 800342 798964 726744 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<3>, NumHighPrioTasks<7>>/32 -0.0030 -0.0911 1604860 1600044 1598235 1452647 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<7>, NumHighPrioTasks<7>>/4 +0.0348 -0.3515 202073 209107 199452 129352 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<7>, NumHighPrioTasks<7>>/8 -0.0004 -0.3628 406727 406545 400942 255464 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<7>, NumHighPrioTasks<7>>/16 -0.0176 -0.3705 821725 807256 803722 505959 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<15>, NumHighPrioTasks<7>>/2 +0.0575 +0.0699 138530 146498 79463 85020 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<15>, NumHighPrioTasks<7>>/4 -0.2307 -0.4182 327417 251885 222502 129448 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<50>, NumWaitingThreads<15>, NumHighPrioTasks<7>>/8 -0.4166 -0.5733 765495 446598 535265 228384 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<3>, NumHighPrioTasks<7>>/8 +0.0001 +0.0022 800108 800227 759501 761200 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<3>, NumHighPrioTasks<7>>/16 +0.0002 +0.0052 1599998 1600327 1515336 1523162 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<3>, NumHighPrioTasks<7>>/32 -0.0004 +0.0029 3201730 3200529 3037191 3045996 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<7>, NumHighPrioTasks<7>>/4 -0.0063 +0.3625 402752 400231 231304 315156 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<7>, NumHighPrioTasks<7>>/8 -0.0029 +0.5760 802313 799998 401474 632716 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<7>, NumHighPrioTasks<7>>/16 -0.0014 +0.4607 1602184 1600012 877859 1282310 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<15>, NumHighPrioTasks<7>>/2 -0.0492 +0.3586 212875 202398 100437 136457 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<15>, NumHighPrioTasks<7>>/4 -0.0927 +0.4432 444857 403606 181089 261350 BM_1_atomic_multi_waiter_1_notifier<NotifyEveryNus<100>, NumWaitingThreads<15>, NumHighPrioTasks<7>>/8 -0.0704 +0.8210 861808 801099 318774 580489 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<0>>/4096 -0.0730 -0.0762 333804 309427 333180 307803 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<0>>/8192 -0.0775 -0.0795 701228 646853 700065 644381 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<0>>/16384 +0.0245 +0.0229 1328777 1361291 1326360 1356745 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<3>, NumHighPrioTasks<0>>/1024 -0.0541 -0.0562 201559 190662 201259 189940 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<3>, NumHighPrioTasks<0>>/2048 -0.1959 -0.1986 416092 334584 415412 332927 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<3>, NumHighPrioTasks<0>>/4096 -0.1699 -0.1710 811966 674040 810157 671584 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<5>, NumHighPrioTasks<0>>/1024 +0.1383 +0.1301 379893 432426 377756 426885 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<5>, NumHighPrioTasks<0>>/2048 +0.0396 +0.0339 822384 854937 818110 845866 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<5>, NumHighPrioTasks<0>>/4096 +0.2499 +0.2451 1350161 1687588 1345121 1674845 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<7>, NumHighPrioTasks<0>>/256 +0.0042 +0.0101 213598 214487 199282 201303 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<7>, NumHighPrioTasks<0>>/512 -0.1034 -0.1065 428033 383755 409546 365945 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<7>, NumHighPrioTasks<0>>/1024 -0.0972 -0.1064 833189 752165 810146 723952 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<2>, NumHighPrioTasks<0>>/1024 +0.0001 -0.1103 51201684 51204581 51124714 45485867 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<2>, NumHighPrioTasks<0>>/2048 -0.0000 -0.1202 102409167 102405120 102243857 89953750 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<2>, NumHighPrioTasks<0>>/4096 +0.0000 -0.1166 204807125 204813833 204453333 180618500 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<3>, NumHighPrioTasks<0>>/256 +0.0002 -0.1623 12803624 12806161 12778727 10704806 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<3>, NumHighPrioTasks<0>>/512 -0.0002 -0.1414 25607327 25603223 25551852 21939152 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<3>, NumHighPrioTasks<0>>/1024 -0.0002 -0.1653 51212196 51202776 51126643 42673625 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<5>, NumHighPrioTasks<0>>/256 -0.0002 -0.0709 12805016 12802157 12784636 11878785 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<5>, NumHighPrioTasks<0>>/512 -0.0002 -0.1182 25611565 25606346 25560815 22540033 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<5>, NumHighPrioTasks<0>>/1024 -0.0002 -0.0813 51220762 51208122 51121071 46963571 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<7>, NumHighPrioTasks<0>>/64 +0.0012 -0.2125 3219858 3223858 3194027 2515373 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<7>, NumHighPrioTasks<0>>/128 -0.0370 -0.2643 6668396 6421601 6563402 4828970 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<7>, NumHighPrioTasks<0>>/256 -0.0288 -0.2220 13220067 12839487 13073964 10172062 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<2>, NumHighPrioTasks<0>>/256 -0.0000 -0.0105 25602159 25600917 24178138 23923759 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<2>, NumHighPrioTasks<0>>/512 +0.0000 -0.0175 51201819 51203125 48569867 47718143 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<2>, NumHighPrioTasks<0>>/1024 +0.0001 -0.0118 102404155 102414482 96908714 95760857 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<3>, NumHighPrioTasks<0>>/256 +0.0000 -0.0574 25599943 25600621 25326679 23871733 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<3>, NumHighPrioTasks<0>>/512 +0.0001 -0.0813 51200525 51206978 50459500 46355867 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<3>, NumHighPrioTasks<0>>/1024 +0.0001 -0.0774 102400405 102409875 101483571 93631000 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<5>, NumHighPrioTasks<0>>/128 -0.0002 +0.0456 12802792 12800864 11731131 12265881 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<5>, NumHighPrioTasks<0>>/256 -0.0000 +0.0649 25601667 25601070 22686065 24157862 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<5>, NumHighPrioTasks<0>>/512 -0.0005 +0.0513 51224453 51200650 45549867 47885533 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<7>, NumHighPrioTasks<0>>/64 -0.0014 +0.2205 6408711 6400039 4698868 5734934 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<7>, NumHighPrioTasks<0>>/128 +0.0155 +0.2459 12810413 13009276 9163080 11416117 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<7>, NumHighPrioTasks<0>>/256 +0.0081 +0.2304 25603646 25811111 18779784 23106867 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<4>>/128 -0.1103 -0.1108 24307 21625 24256 21568 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<4>>/256 +0.0637 +0.0574 45588 48491 45498 48112 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<2>, NumHighPrioTasks<4>>/512 -0.0519 -0.0539 90764 86054 90527 85648 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<3>, NumHighPrioTasks<4>>/128 +0.1161 +0.1083 28810 32155 28722 31832 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<3>, NumHighPrioTasks<4>>/256 +0.1152 +0.1094 64670 72123 64461 71512 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<3>, NumHighPrioTasks<4>>/512 -0.0804 -0.0993 125916 115796 125476 113010 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<5>, NumHighPrioTasks<4>>/64 +0.2682 -0.2446 53787 68210 51896 39203 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<5>, NumHighPrioTasks<4>>/128 +0.5732 -0.4832 103915 163474 100825 52105 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<5>, NumHighPrioTasks<4>>/256 +0.1283 -0.4606 211518 238645 203852 109957 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<7>, NumHighPrioTasks<4>>/16 -0.1526 +0.1523 59673 50567 23275 26819 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<7>, NumHighPrioTasks<4>>/32 -0.0492 +0.7075 82796 78719 24187 41298 BM_N_atomics_N_waiter_N_notifier<KeepNotifying, NumberOfAtomics<7>, NumHighPrioTasks<4>>/64 -0.0712 +0.0764 150268 139570 55304 59527 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<2>, NumHighPrioTasks<4>>/128 -0.0004 -0.0828 6402859 6400308 6380145 5851557 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<2>, NumHighPrioTasks<4>>/256 -0.0002 -0.0370 12802978 12801020 12769107 12296293 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<2>, NumHighPrioTasks<4>>/512 -0.0028 -0.0799 25674170 25601862 25612667 23566586 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<3>, NumHighPrioTasks<4>>/128 -0.0004 -0.0672 6402990 6400344 6382100 5953248 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<3>, NumHighPrioTasks<4>>/256 -0.0004 -0.0841 12806197 12801334 12765891 11691661 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<3>, NumHighPrioTasks<4>>/512 -0.0006 -0.0574 25615708 25601085 25533250 24067828 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<5>, NumHighPrioTasks<4>>/32 -0.0163 -0.2801 1645647 1618805 1614735 1162471 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<5>, NumHighPrioTasks<4>>/64 -0.0211 -0.2501 3285234 3216045 3217295 2412509 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<5>, NumHighPrioTasks<4>>/128 -0.0502 -0.2956 6755976 6416549 6653264 4686407 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<7>, NumHighPrioTasks<4>>/8 -0.0815 -0.2227 534476 490942 337482 262341 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<7>, NumHighPrioTasks<4>>/16 +0.0973 -0.0629 1071127 1175390 664897 623053 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<50>, NumberOfAtomics<7>, NumHighPrioTasks<4>>/32 -0.2263 -0.3717 2297477 1777444 1488023 934861 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<2>, NumHighPrioTasks<4>>/64 -0.0000 +0.0183 6400348 6400261 6145171 6257342 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<2>, NumHighPrioTasks<4>>/128 +0.0000 +0.0194 12800545 12800759 12279474 12517804 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<2>, NumHighPrioTasks<4>>/256 +0.0001 +0.0111 25601568 25602976 24636179 24909821 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<3>, NumHighPrioTasks<4>>/64 -0.0000 +0.0545 6400600 6400444 5795288 6111077 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<3>, NumHighPrioTasks<4>>/128 +0.0001 +0.0474 12800507 12801355 11566729 12114860 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<3>, NumHighPrioTasks<4>>/256 +0.0000 +0.0423 25601503 25601760 23281967 24267276 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<5>, NumHighPrioTasks<4>>/32 +0.0005 +0.2842 3201968 3203421 2175379 2793615 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<5>, NumHighPrioTasks<4>>/64 -0.0003 +0.3807 6402555 6400496 4052465 5595309 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<5>, NumHighPrioTasks<4>>/128 -0.0003 +0.3827 12804155 12800925 8114370 11219400 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<7>, NumHighPrioTasks<4>>/8 +0.0262 +0.1272 821954 843475 503297 567320 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<7>, NumHighPrioTasks<4>>/16 +0.0298 +0.3134 1634476 1683172 901978 1184619 BM_N_atomics_N_waiter_N_notifier<NotifyEveryNus<100>, NumberOfAtomics<7>, NumHighPrioTasks<4>>/32 +0.0147 +0.2925 3244262 3291994 1721000 2224350 OVERALL_GEOMEAN +0.0185 -0.1876 0 0 0 0 ``` --------- Co-authored-by: Hui Xie <huixie@Huis-MacBook-Pro.local> Co-authored-by: Hui Xie <huixie@Mac.broadband>	2025-01-18 14:50:53 +00:00
Louis Dionne	65dc0d4447	[libc++] Remove string benchmark for internal function We strive to keep our benchmarks portable, so we should only benchmark standard APIs.	2025-01-16 17:46:12 -05:00
Peng Liu	0298e58c7d	[libc++] Optimize input_iterator-pair `insert` for std::vector (#113768 ) As a follow-up to #113852, this PR optimizes the performance of the `insert(const_iterator pos, InputIt first, InputIt last)` function for `input_iterator`-pair inputs in `std::vector` for cases where reallocation occurs during insertion. Additionally, this optimization enhances exception safety by replacing the traditional `try-catch` mechanism with a modern exception guard for the `insert` function. The optimization targets cases where insertion trigger reallocation. In scenarios without reallocation, the implementation remains unchanged. Previous implementation ----------------------- The previous implementation of `insert` is inefficient in reallocation scenarios because it performs the following steps separately: - `reserve()`: This leads to the first round of relocating old elements to new memory; - `rotate()`: This leads to the second round of reorganizing the existing elements; - Move-forward: Moves the elements after the insertion position to their final positions. - Insert: performs the actual insertion. This approach results in a lot of redundant operations, requiring the elements to undergo three rounds of relocations/reorganizations to be placed in their final positions. Proposed implementation ----------------------- The proposed implementation jointly optimize the above 4 steps in the previous implementation such that each element is placed in its final position in just one round of relocation. Specifically, this optimization reduces the total cost from 2 relocations + 1 std::rotate call to just 1 relocation, without needing to call `std::rotate`, thereby significantly improving overall performance.	2025-01-14 11:40:29 -05:00
Louis Dionne	15f30e70eb	[libc++] Fix the batch size used in the std::gcd benchmark (#120618 ) Since that benchmark is testing n*n inputs, the batch size reported to GoogleBenchmark should be that amount. Otherwise, GoogleBenchmark reports the timing for calling std::gcd on the whole sequence, which is misleading.	2025-01-06 15:34:33 -05:00
Nikolas Klauser	b84218526d	[libc++] Mark num_get.bench.cpp as unsupported in C++03	2024-12-23 00:55:14 +01:00
Nikolas Klauser	c5492e3c65	[libc++] Add a benchmark for std::num_get	2024-12-22 20:44:16 +01:00
Nikolas Klauser	fd784726db	[libc++] Rewrite minmax_element benchmark The benchmark currently uses makeCartesianProductBenchmark, which doesn't make a ton of sense, since minmax_element always goes through every element one by one. The runtime doesn't depend on the values of the elements. Fixes #120758	2024-12-21 20:07:28 +01:00
Louis Dionne	ef42e9c59a	[libc++] Remove allocation.bench.cpp (#120767 ) That benchmark isn't really useful, since it doesn't benchmark anything from libc++ (besides `operator new`). The implementation of the benchmark also has serious problems like the fact that it allocates an unknown amount of memory without deallocating it.	2024-12-20 15:33:52 -05:00
Louis Dionne	6a9279ca40	[libc++] Slight reorganization of the benchmarks (#119625 ) Move various container benchmarks to the same subdirectory, and regroup some format-related benchmarks.	2024-12-12 08:14:50 -05:00
Louis Dionne	6cb339f9c1	[libc++] Refactor tests for aligned allocation and sized deallocation (#117915 ) This patch refactors the tests around aligned allocation and sized deallocation to avoid relying on passing the -fsized-deallocation or -faligned-allocation flags by default. Since both of these features are enabled by default in >= C++14 mode, it now makes sense to make that assumption in the test suite. A notable exception is MinGW and some older compilers, where sized deallocation is still not enabled by default. We treat that as a "bug" in the test suite and we work around it by explicitly adding -fsized-deallocation, but only under those configurations.	2024-12-06 16:12:30 -05:00
Peng Liu	056153f36e	Optimize vector::assign for InputIterator-only pair inputs (#113852 ) This PR optimizes the input iterator overload of `assign(_InputIterator, _InputIterator)` in `std::vector<_Tp, _Allocator>` by directly assigning to already initialized memory, rather than first destroying existing elements and then constructing new ones. By eliminating unnecessary destruction and construction, the proposed algorithm enhances the performance by up to 2x for trivial element types (e.g., `std::vector<int>`), up to 2.6x for non-trivial element types like `std::vector<std::string>`, and up to 3.4x for more complex non-trivial types (e.g., `std::vector<std::vector<int>>`). ### Google Benchmarks Benchmark tests (`libcxx/test/benchmarks/vector_operations.bench.cpp`) were conducted for the `assign()` implementations before and after this patch. The tests focused on trivial element types like `std::vector<int>`, and non-trivial element types such as `std::vector<std::string>` and `std::vector<std::vector<int>>`. #### Before ``` ------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------- BM_AssignInputIterIter/vector_int/1024/1024 1157 ns 1169 ns 608188 BM_AssignInputIterIter<32>/vector_string/1024/1024 14559 ns 14710 ns 47277 BM_AssignInputIterIter<32>/vector_vector_int/1024/1024 26846 ns 27129 ns 25925 ``` #### After ``` ------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------- BM_AssignInputIterIter/vector_int/1024/1024 561 ns 566 ns 1242251 BM_AssignInputIterIter<32>/vector_string/1024/1024 5604 ns 5664 ns 128365 BM_AssignInputIterIter<32>/vector_vector_int/1024/1024 7927 ns 8012 ns 88579 ```	2024-11-28 20:52:59 +01:00
Nikolas Klauser	9ebc6f5d6d	[libc++] Include headers in <thread> conditionally (#116539 )	2024-11-20 23:07:20 +01:00
Petr Hosek	012dd8be4b	[libcxx] Passthrough the necessary CMake variables to benchmarks (#116644 ) This addresses the issue uncovered by #115361. Previously, we weren't building benchmarks in many cases due to the following block: `e58949632e/libcxx/CMakeLists.txt (L162-L172)` We need to passthrough the necessary variables into the benchmarks subbuild and use correct syntax.	2024-11-19 14:23:29 -08:00
Louis Dionne	3a3517c5e9	[libc++] Improve the tests for vector::erase (#116265 ) In particular, test everything with both a normal and a min_allocator, add tests for a few corner cases and add tests with types that are trivially relocatable. Also add tests that count the number of assignments performed by vector::erase, since that is mandated by the Standard. This patch is a preparation for optimizing vector::erase.	2024-11-19 00:49:22 +01:00
Peng Liu	c25e09e238	[libc++][test] Speed up input generating functions for benchmark tests (#115544 ) The input generating functions for benchmark tests in the GenerateInput.h file can be slightly improved by invoking vector::reserve before calling vector::push_back. This slight performance improvement could potentially speed-up all benchmark tests for containers and algorithms that use these functions as inputs.	2024-11-18 16:32:54 +01:00
Peng Liu	3af4c2e16e	[libc++][test] Clean up code in GenerateInput.h for benchmark testing (#115560 ) This PR refines the code in `GenerateInput.h` used for benchmark testing by implementing the following changes: - Replaced all unqualified usages of `size_t` with `std::size_t`. - Removed unnecessary curly braces `{}` from for loops that contain simple single-statement bodies, in accordance with LLVM coding standards.	2024-11-11 10:02:56 -05:00
Peng Liu	7844257fc2	[libc++] Use explicit #include instead of transitive #include (#115420 ) This benchmark test currently uses `std::unique_ptr` without explicitly `#include <memory>`. I think we should not rely on transitive inclusion.	2024-11-08 16:07:51 +01:00
Louis Dionne	e236a52a88	[libc++] Unify the benchmarks with the test suite (#101399 ) Instead of building the benchmarks separately via CMake and running them separately from the test suite, this patch merges the benchmarks into the test suite and handles both uniformly. As a result: - It is now possible to run individual benchmarks like we run tests (e.g. using libcxx-lit), which is a huge quality-of-life improvement. - The benchmarks will be run under exactly the same configuration as the rest of the tests, which is a nice simplification. This does mean that one has to be careful to enable the desired optimization flags when running benchmarks, but that is easy with e.g. `libcxx-lit <...> --param optimization=speed`. - Benchmarks can use the same annotations as the rest of the test suite, such as `// UNSUPPORTED` & friends. When running the tests via `check-cxx`, we only compile the benchmarks because running them would be too time consuming. This introduces a bit of complexity in the testing setup, and instead it would be better to allow passing a --dry-run flag to GoogleBenchmark executables, which is the topic of https://github.com/google/benchmark/issues/1827. I am not really satisfied with the layering violation of adding the %{benchmark_flags} substitution to cmake-bridge, however I believe this can be improved in the future.	2024-11-07 09:07:50 -05:00
Nikolas Klauser	c6f3b7bcd0	[libc++] Refactor the configuration macros to being always defined (#112094 ) This is a follow-up to #89178. This updates the `<__config_site>` macros.	2024-11-06 10:39:19 +01:00
Louis Dionne	b2d2494731	[libc++] Make benchmarks forward-compatible with the test suite (#114502 ) This patch fixes warnings and errors that come up when running the benchmarks as part of the test suite. It also adds the necessary Lit annotations to make it pass in various configurations and increases the portability of the benchmarks.	2024-11-05 09:08:00 -05:00
Louis Dionne	d1b311d7d2	[libc++] Split std::hash benchmark out of std::unordered_set benchmark (#114448 ) As a drive-by, remove unused functor inside the unordered_set benchmark. That benchmark still isn't very exhaustive, but that can be addressed separately.	2024-10-31 22:53:06 -04:00
Mark de Wever	d54b1cfa38	[libc++][format][1/3] Adds more benchmarks. (#101803 ) This patch is the start of a series to improve the speed of std::format, std::format_to, std::format_to_n, and std::formatted_size. This is mostly achieved by changing the __output_buffer class. This new __output_buffer class also makes it easier to implement buffering for P3107R5 "Permit an efficient implementation of std::print"	2024-10-06 18:18:57 +02:00
Louis Dionne	953af0e7f1	[libc++][NFC] Increase consistency for namespace closing comments	2024-09-05 12:41:20 -04:00
Nikolas Klauser	5e19e317c0	[libc++][NFC] Canonicalize the benchmark suite a bit This replaces `BENCHMARK_TEMPLATE` with `BENCHMARK` and uses `BENCHMARK_MAIN()` when possible.	2024-09-03 13:22:01 +02:00
Nikolas Klauser	d07fdf9779	[libc++] Optimize lexicographical_compare (#65279 ) If the comparison operation is equivalent to < and that is a total order, we know that we can use equality comparison on that type instead to extract some information. Furthermore, if equality comparison on that type is trivial, the user can't observe that we're calling it. So instead of using the user-provided total order, we use std::mismatch, which uses equality comparison (and is vertorized). Additionally, if the type is trivially lexicographically comparable, we can go one step further and use std::memcmp directly instead of calling std::mismatch. Benchmarks: ``` ------------------------------------------------------------------------------------- Benchmark old new ------------------------------------------------------------------------------------- bm_lexicographical_compare<unsigned char>/1 1.17 ns 2.34 ns bm_lexicographical_compare<unsigned char>/2 1.64 ns 2.57 ns bm_lexicographical_compare<unsigned char>/3 2.23 ns 2.58 ns bm_lexicographical_compare<unsigned char>/4 2.82 ns 2.57 ns bm_lexicographical_compare<unsigned char>/5 3.34 ns 2.11 ns bm_lexicographical_compare<unsigned char>/6 3.94 ns 2.21 ns bm_lexicographical_compare<unsigned char>/7 4.56 ns 2.11 ns bm_lexicographical_compare<unsigned char>/8 5.25 ns 2.11 ns bm_lexicographical_compare<unsigned char>/16 9.88 ns 2.11 ns bm_lexicographical_compare<unsigned char>/64 38.9 ns 2.36 ns bm_lexicographical_compare<unsigned char>/512 317 ns 6.54 ns bm_lexicographical_compare<unsigned char>/4096 2517 ns 41.4 ns bm_lexicographical_compare<unsigned char>/32768 20052 ns 488 ns bm_lexicographical_compare<unsigned char>/262144 159579 ns 4409 ns bm_lexicographical_compare<unsigned char>/1048576 640456 ns 20342 ns bm_lexicographical_compare<signed char>/1 1.18 ns 2.37 ns bm_lexicographical_compare<signed char>/2 1.65 ns 2.60 ns bm_lexicographical_compare<signed char>/3 2.23 ns 2.83 ns bm_lexicographical_compare<signed char>/4 2.81 ns 3.06 ns bm_lexicographical_compare<signed char>/5 3.35 ns 3.30 ns bm_lexicographical_compare<signed char>/6 3.90 ns 3.99 ns bm_lexicographical_compare<signed char>/7 4.56 ns 3.78 ns bm_lexicographical_compare<signed char>/8 5.20 ns 4.02 ns bm_lexicographical_compare<signed char>/16 9.80 ns 6.21 ns bm_lexicographical_compare<signed char>/64 39.0 ns 3.16 ns bm_lexicographical_compare<signed char>/512 318 ns 7.58 ns bm_lexicographical_compare<signed char>/4096 2514 ns 47.4 ns bm_lexicographical_compare<signed char>/32768 20096 ns 504 ns bm_lexicographical_compare<signed char>/262144 156617 ns 4146 ns bm_lexicographical_compare<signed char>/1048576 624265 ns 19810 ns bm_lexicographical_compare<int>/1 1.15 ns 2.12 ns bm_lexicographical_compare<int>/2 1.60 ns 2.36 ns bm_lexicographical_compare<int>/3 2.21 ns 2.59 ns bm_lexicographical_compare<int>/4 2.74 ns 2.83 ns bm_lexicographical_compare<int>/5 3.26 ns 3.06 ns bm_lexicographical_compare<int>/6 3.81 ns 4.53 ns bm_lexicographical_compare<int>/7 4.41 ns 4.72 ns bm_lexicographical_compare<int>/8 5.08 ns 2.36 ns bm_lexicographical_compare<int>/16 9.54 ns 3.08 ns bm_lexicographical_compare<int>/64 37.8 ns 4.71 ns bm_lexicographical_compare<int>/512 309 ns 24.6 ns bm_lexicographical_compare<int>/4096 2422 ns 204 ns bm_lexicographical_compare<int>/32768 19362 ns 1947 ns bm_lexicographical_compare<int>/262144 155727 ns 19793 ns bm_lexicographical_compare<int>/1048576 623614 ns 80180 ns bm_ranges_lexicographical_compare<unsigned char>/1 1.07 ns 2.35 ns bm_ranges_lexicographical_compare<unsigned char>/2 1.72 ns 2.13 ns bm_ranges_lexicographical_compare<unsigned char>/3 2.46 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/4 3.17 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/5 3.86 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/6 4.55 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/7 5.25 ns 2.12 ns bm_ranges_lexicographical_compare<unsigned char>/8 5.95 ns 2.13 ns bm_ranges_lexicographical_compare<unsigned char>/16 11.7 ns 2.13 ns bm_ranges_lexicographical_compare<unsigned char>/64 45.5 ns 2.36 ns bm_ranges_lexicographical_compare<unsigned char>/512 366 ns 6.35 ns bm_ranges_lexicographical_compare<unsigned char>/4096 2886 ns 40.9 ns bm_ranges_lexicographical_compare<unsigned char>/32768 23054 ns 489 ns bm_ranges_lexicographical_compare<unsigned char>/262144 185302 ns 4339 ns bm_ranges_lexicographical_compare<unsigned char>/1048576 741576 ns 19430 ns bm_ranges_lexicographical_compare<signed char>/1 1.10 ns 2.12 ns bm_ranges_lexicographical_compare<signed char>/2 1.66 ns 2.35 ns bm_ranges_lexicographical_compare<signed char>/3 2.23 ns 2.58 ns bm_ranges_lexicographical_compare<signed char>/4 2.82 ns 2.82 ns bm_ranges_lexicographical_compare<signed char>/5 3.34 ns 3.06 ns bm_ranges_lexicographical_compare<signed char>/6 3.92 ns 3.99 ns bm_ranges_lexicographical_compare<signed char>/7 4.64 ns 4.10 ns bm_ranges_lexicographical_compare<signed char>/8 5.21 ns 4.61 ns bm_ranges_lexicographical_compare<signed char>/16 9.79 ns 7.42 ns bm_ranges_lexicographical_compare<signed char>/64 38.9 ns 2.93 ns bm_ranges_lexicographical_compare<signed char>/512 317 ns 7.31 ns bm_ranges_lexicographical_compare<signed char>/4096 2500 ns 47.5 ns bm_ranges_lexicographical_compare<signed char>/32768 19940 ns 496 ns bm_ranges_lexicographical_compare<signed char>/262144 159166 ns 4393 ns bm_ranges_lexicographical_compare<signed char>/1048576 638206 ns 19786 ns bm_ranges_lexicographical_compare<int>/1 1.10 ns 2.12 ns bm_ranges_lexicographical_compare<int>/2 1.64 ns 3.04 ns bm_ranges_lexicographical_compare<int>/3 2.23 ns 2.58 ns bm_ranges_lexicographical_compare<int>/4 2.81 ns 2.81 ns bm_ranges_lexicographical_compare<int>/5 3.35 ns 3.05 ns bm_ranges_lexicographical_compare<int>/6 3.94 ns 4.60 ns bm_ranges_lexicographical_compare<int>/7 4.60 ns 4.81 ns bm_ranges_lexicographical_compare<int>/8 5.19 ns 2.35 ns bm_ranges_lexicographical_compare<int>/16 9.85 ns 2.87 ns bm_ranges_lexicographical_compare<int>/64 38.9 ns 4.70 ns bm_ranges_lexicographical_compare<int>/512 318 ns 24.5 ns bm_ranges_lexicographical_compare<int>/4096 2494 ns 202 ns bm_ranges_lexicographical_compare<int>/32768 20000 ns 1939 ns bm_ranges_lexicographical_compare<int>/262144 160433 ns 19730 ns bm_ranges_lexicographical_compare<int>/1048576 642636 ns 80760 ns ```	2024-08-04 10:02:43 +02:00
Louis Dionne	6a54dfbfe5	[libc++][NFC] Add missing license headers Also standardize the license comment in several files where it was different from what we normally do.	2024-07-31 12:58:09 -04:00
Louis Dionne	78b4b5cccb	[libc++] Move the benchmarks under libcxx/test (#99371 ) This is an intermediate and fairly mechanical step towards unifying the benchmarks with the rest of the test suite. Moving this around requires a few changes, notably making sure we don't throw a wrench into the discovery process of the normal test suite. This won't be a problem anymore once benchmarks are taken into account by the test setup out of the box.	2024-07-31 11:18:32 -04:00

44 Commits