llvm-project/libc/docs/dev/printf_behavior.rst

.. _printf_behavior:

====================================
Printf Behavior Under All Conditions
====================================

Introduction:
=============
On the "defining undefined behavior" page, I said you should write down your
decisions regarding undefined behavior in your functions. This is that document
for my printf implementation.

Unless otherwise specified, the functionality described is aligned with the ISO
C standard and POSIX standard. If any behavior is not mentioned here, it should
be assumed to follow the behavior described in those standards.

The LLVM-libc codebase is under active development, and may change. This
document was last updated [January 8, 2024] by [michaelrj] and may
not be accurate after this point.

The behavior of LLVM-libc's printf is heavily influenced by compile-time flags.
Make sure to check what flags are defined before filing a bug report. It is also
not relevant to any other libc implementation of printf, which may or may not
share the same behavior.

This document assumes familiarity with the definition of the printf function and
is intended as a reference, not a replacement for the original standards.

--------------
General Flags:
--------------
These compile-time flags will change the behavior of LLVM-libc's printf when it
is compiled. Combinations of flags that are incompatible will be marked.

LIBC_COPT_STDIO_USE_SYSTEM_FILE
-------------------------------
When set, this flag changes fprintf and printf to use the FILE API from the
system's libc, instead of LLVM-libc's internal FILE API. This is set by default
when LLVM-libc is built in overlay mode.

LIBC_COPT_PRINTF_DISABLE_INDEX_MODE
-----------------------------------
When set, this flag disables support for the POSIX "%n$" format, hereafter
referred to as "index mode"; conversions using the index mode format will be
treated as invalid. This reduces code size.

LIBC_COPT_PRINTF_INDEX_ARR_LEN
------------------------------
This flag takes a positive integer value, defaulting to 128. This flag
determines the number of entries the parser's type descriptor array has. This is
used in index mode to avoid re-parsing the format string to determine types when
an index lower than the previously specified one is requested. This has no
effect when index mode is disabled.

LIBC_COPT_PRINTF_DISABLE_WRITE_INT
----------------------------------
When set, this flag disables support for the C Standard "%n" conversion; any
"%n" conversion will be treated as invalid. This is set by default to improve
security.

LIBC_COPT_PRINTF_DISABLE_FLOAT
------------------------------
When set, this flag disables support for floating point numbers and all their
conversions (%a, %f, %e, %g); any floating point number conversion will be
treated as invalid. This reduces code size.

LIBC_COPT_PRINTF_DISABLE_FIXED_POINT
------------------------------------
When set, this flag disables support for fixed point numbers and all their
conversions (%r, %k); any fixed point number conversion will be treated as
invalid. This reduces code size. This has no effect if the current compiler does
not support fixed point numbers.

LIBC_COPT_PRINTF_NO_NULLPTR_CHECKS
----------------------------------
When set, this flag disables the nullptr checks in %n and %s.

LIBC_COPT_PRINTF_CONV_ATLAS
---------------------------
When set, this flag changes the include path for the "converter atlas" which is
a header that includes all the files containing the conversion functions. This
is not recommended to be set without careful consideration.

LIBC_COPT_PRINTF_HEX_LONG_DOUBLE
--------------------------------
When set, this flag replaces all decimal long double conversions (%Lf, %Le, %Lg)
with hexadecimal long double conversions (%La). This will improve performance
significantly, but may cause some tests to fail. This has no effect when float
conversions are disabled.

--------------------------------
Float Conversion Internal Flags:
--------------------------------
The following floating point conversion flags are provided for reference, but
are not recommended to be adjusted except by persons familiar with the Printf
Ryu Algorithm. Additionally they have no effect when float conversions are
disabled.

LIBC_COPT_FLOAT_TO_STR_NO_SPECIALIZE_LD
---------------------------------------
This flag disables the separate long double conversion implementation. It is
not based on the Ryu algorithm, instead generating the digits by
multiplying/dividing the written-out number by 10^9 to get blocks. It's
significantly faster than INT_CALC, only about 10x slower than MEGA_TABLE,
and is small in binary size. Its downside is that it always calculates all
of the digits above the decimal point, making it slightly inefficient for %e
calls with large exponents. This is the default. This specialization overrides
other flags, so this flag must be set for other flags to effect the long double
behavior.

LIBC_COPT_FLOAT_TO_STR_USE_MEGA_LONG_DOUBLE_TABLE
-------------------------------------------------
When set, the float to string decimal conversion algorithm will use a larger
table to accelerate long double conversions. This larger table is around 5MB of
size when compiled.

LIBC_COPT_FLOAT_TO_STR_USE_DYADIC_FLOAT
---------------------------------------
When set, the float to string decimal conversion algorithm will use dyadic
floats instead of a table when performing floating point conversions. This
results in ~50 digits of accuracy in the result, then zeroes for the remaining
values. This may improve performance but may also cause some tests to fail. The
flag ending in _LD is the same, but only applies to long double decimal
conversions.

LIBC_COPT_FLOAT_TO_STR_USE_INT_CALC
-----------------------------------
When set, the float to string decimal conversion algorithm will use wide
integers instead of a table when performing floating point conversions. This
gives the same results as the table, but is very slow at the extreme ends of
the long double range.

LIBC_COPT_FLOAT_TO_STR_NO_TABLE
-------------------------------
When set, the float to string decimal conversion algorithm will not use either
the mega table or the normal table for any conversions. Instead it will set
algorithmic constants to improve performance when using calculation algorithms.
If this flag is set without any calculation algorithm flag set, an error will
occur.

--------
Parsing:
--------

When printf encounters an invalid conversion specification, the entire
conversion specification will be passed literally to the output string.
As an example, printf("%Z") would display "%Z".

If an index mode conversion is requested for index "n" and there exists a number
in [1,n) that does not have a conversion specified in the format string, then
the conversion for index "n" is considered invalid.

If a non-index mode (also referred to as sequential mode) conversion is
specified after an index mode conversion, the next argument will be read but the
current index will not be incremented. From this point on, the arguments
selected by each conversion may or may not be correct. This is considered
dangerously undefined and may change without warning.

If a conversion specification is provided an invalid type modifier, that type
modifier will be ignored, and the default type for that conversion will be used.
In the case of the length modifier "L" and integer conversions, it will be
treated as if it was "ll" (lowercase LL). For this purpose the list of integer
conversions is d, i, u, o, x, X, n.

If a conversion specification ending in % has any options that consume arguments
(e.g. "%*.*%") those arguments will be consumed as normal, but their values will
be ignored.

If a conversion specification ends in a null byte ('\0') then it shall be
treated as an invalid conversion followed by a null byte.

If a number passed as a min width or precision value is out of range for an int,
then it will be treated as the largest or smallest value in the int range
(e.g. "%-999999999999.999999999999s" is the same as "%-2147483648.2147483647s").

If a number passed as a bit width is less than or equal to zero, the conversion
is considered invalid. If the provided bit width is larger than the width of
uintmax_t, it will be clamped to the width of uintmax_t.

----------
Conversion
----------
Any conversion specification that contains a flag or option that it does not
have defined behavior for will ignore that flag or option (e.g. %.5c is the same
as %c).

If a conversion specification ends in %, then it will be treated as if it is
"%%", ignoring all options.

If a null pointer is passed to a %s conversion specification and null pointer
checks are enabled, it will be treated as if the provided string is "null".

If a null pointer is passed to a %n conversion specification and null pointer
checks are enabled, the conversion will fail and printf will return a negative
value.

If a null pointer is passed to a %p conversion specification, the string
"(nullptr)" will be returned instead of an integer value.

The %p conversion will display any non-null pointer as if it was a uintptr value
passed to a "%#tx" conversion, with all other options remaining the same as the
original conversion.

The %p conversion will display a null pointer as if it was the string
"(nullptr)" passed to a "%s" conversion, with all other options remaining the
same as the original conversion.

The %r, %R, %k, and %K fixed point number format specifiers are accepted as
defined in ISO/IEC TR 18037 (the fixed point number extension). These are
available when the compiler is detected as having support for fixed point
numbers and the LIBC_COPT_PRINTF_DISABLE_FIXED_POINT flag is not set.
[libc] Make 'printf' converter output "(null)" instead of "null" (#85845) Summary: Currently we print `null` for the null pointer in a `%s` expression. Although it's not defined by the standard, other implementations choose to use `(null)` to indicate this. We also currently print `(nullptr)` so I think it's more consistent to use parens in both cases. 2024-03-19 14:44:59 -05:00			`.. _printf_behavior:`

[libc][docs] Printf behavior doc In the document on undefined behavior, I noted that writing down your decisions is very important. This document contains all the information for compile flags and undefined behavior for our printf. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D158311 2023-08-18 13:41:41 -07:00			`====================================`
			`Printf Behavior Under All Conditions`
			`====================================`

[libc][NFC] Remove all trailing spaces from libc (#82831) Summary: There are a lot of random training spaces on various lines. This patch just got rid of all of them with `sed 's/\ \+$//g'. 2024-02-23 16:34:00 -06:00			`Introduction:`
[libc][docs] Printf behavior doc In the document on undefined behavior, I noted that writing down your decisions is very important. This document contains all the information for compile flags and undefined behavior for our printf. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D158311 2023-08-18 13:41:41 -07:00			`=============`
			`On the "defining undefined behavior" page, I said you should write down your`
			`decisions regarding undefined behavior in your functions. This is that document`
			`for my printf implementation.`

			`Unless otherwise specified, the functionality described is aligned with the ISO`
			`C standard and POSIX standard. If any behavior is not mentioned here, it should`
			`be assumed to follow the behavior described in those standards.`

			`The LLVM-libc codebase is under active development, and may change. This`
[libc] Move printf long double to simple calc (#75414) The Ryu algorithm is very fast with its table, but that table grows too large for long doubles. This patch adds a method of calculating the digits of long doubles using just wide integers and fast modulo operations. This results in significant performance improvements vs the previous int calc mode, while taking up a similar amound of peak memory. It will be slow in some %e/%g cases, but reasonable fast for %f with no loss of accuracy. 2024-01-25 09:35:40 -08:00			`document was last updated [January 8, 2024] by [michaelrj] and may`
[libc][docs] Printf behavior doc In the document on undefined behavior, I noted that writing down your decisions is very important. This document contains all the information for compile flags and undefined behavior for our printf. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D158311 2023-08-18 13:41:41 -07:00			`not be accurate after this point.`

			`The behavior of LLVM-libc's printf is heavily influenced by compile-time flags.`
			`Make sure to check what flags are defined before filing a bug report. It is also`
			`not relevant to any other libc implementation of printf, which may or may not`
			`share the same behavior.`

			`This document assumes familiarity with the definition of the printf function and`
			`is intended as a reference, not a replacement for the original standards.`

			`--------------`
			`General Flags:`
			`--------------`
			`These compile-time flags will change the behavior of LLVM-libc's printf when it`
			`is compiled. Combinations of flags that are incompatible will be marked.`

[libc] Refactor scanf reader to match printf (#66023) In a previous patch, the printf writer was rewritten to use a single writer class with a buffer and a callback hook. This patch refactors scanf's reader to match conceptually. 2023-09-22 12:50:02 -07:00			`LIBC_COPT_STDIO_USE_SYSTEM_FILE`
			`-------------------------------`
[libc][docs] Printf behavior doc In the document on undefined behavior, I noted that writing down your decisions is very important. This document contains all the information for compile flags and undefined behavior for our printf. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D158311 2023-08-18 13:41:41 -07:00			`When set, this flag changes fprintf and printf to use the FILE API from the`
			`system's libc, instead of LLVM-libc's internal FILE API. This is set by default`
			`when LLVM-libc is built in overlay mode.`

			`LIBC_COPT_PRINTF_DISABLE_INDEX_MODE`
			`-----------------------------------`
			`When set, this flag disables support for the POSIX "%n$" format, hereafter`
			`referred to as "index mode"; conversions using the index mode format will be`
			`treated as invalid. This reduces code size.`

			`LIBC_COPT_PRINTF_INDEX_ARR_LEN`
			`------------------------------`
			`This flag takes a positive integer value, defaulting to 128. This flag`
			`determines the number of entries the parser's type descriptor array has. This is`
			`used in index mode to avoid re-parsing the format string to determine types when`
			`an index lower than the previously specified one is requested. This has no`
			`effect when index mode is disabled.`

			`LIBC_COPT_PRINTF_DISABLE_WRITE_INT`
			`----------------------------------`
			`When set, this flag disables support for the C Standard "%n" conversion; any`
			`"%n" conversion will be treated as invalid. This is set by default to improve`
			`security.`

			`LIBC_COPT_PRINTF_DISABLE_FLOAT`
			`------------------------------`
			`When set, this flag disables support for floating point numbers and all their`
			`conversions (%a, %f, %e, %g); any floating point number conversion will be`
			`treated as invalid. This reduces code size.`

[libc] Add fixed point support to printf (#82707) This patch adds the r, R, k, and K conversion specifiers to printf, with accompanying tests. They are guarded behind the LIBC_COPT_PRINTF_DISABLE_FIXED_POINT flag as well as automatic fixed point support detection. 2024-02-27 11:03:20 -08:00			`LIBC_COPT_PRINTF_DISABLE_FIXED_POINT`
			`------------------------------------`
			`When set, this flag disables support for fixed point numbers and all their`
			`conversions (%r, %k); any fixed point number conversion will be treated as`
			`invalid. This reduces code size. This has no effect if the current compiler does`
			`not support fixed point numbers.`

[libc][docs] Printf behavior doc In the document on undefined behavior, I noted that writing down your decisions is very important. This document contains all the information for compile flags and undefined behavior for our printf. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D158311 2023-08-18 13:41:41 -07:00			`LIBC_COPT_PRINTF_NO_NULLPTR_CHECKS`
			`----------------------------------`
			`When set, this flag disables the nullptr checks in %n and %s.`

			`LIBC_COPT_PRINTF_CONV_ATLAS`
			`---------------------------`
			`When set, this flag changes the include path for the "converter atlas" which is`
			`a header that includes all the files containing the conversion functions. This`
			`is not recommended to be set without careful consideration.`

			`LIBC_COPT_PRINTF_HEX_LONG_DOUBLE`
			`--------------------------------`
			`When set, this flag replaces all decimal long double conversions (%Lf, %Le, %Lg)`
			`with hexadecimal long double conversions (%La). This will improve performance`
			`significantly, but may cause some tests to fail. This has no effect when float`
			`conversions are disabled.`

			`--------------------------------`
			`Float Conversion Internal Flags:`
			`--------------------------------`
			`The following floating point conversion flags are provided for reference, but`
			`are not recommended to be adjusted except by persons familiar with the Printf`
			`Ryu Algorithm. Additionally they have no effect when float conversions are`
			`disabled.`

[libc] Move printf long double to simple calc (#75414) The Ryu algorithm is very fast with its table, but that table grows too large for long doubles. This patch adds a method of calculating the digits of long doubles using just wide integers and fast modulo operations. This results in significant performance improvements vs the previous int calc mode, while taking up a similar amound of peak memory. It will be slow in some %e/%g cases, but reasonable fast for %f with no loss of accuracy. 2024-01-25 09:35:40 -08:00			`LIBC_COPT_FLOAT_TO_STR_NO_SPECIALIZE_LD`
			`---------------------------------------`
			`This flag disables the separate long double conversion implementation. It is`
			`not based on the Ryu algorithm, instead generating the digits by`
			`multiplying/dividing the written-out number by 10^9 to get blocks. It's`
			`significantly faster than INT_CALC, only about 10x slower than MEGA_TABLE,`
			`and is small in binary size. Its downside is that it always calculates all`
			`of the digits above the decimal point, making it slightly inefficient for %e`
			`calls with large exponents. This is the default. This specialization overrides`
			`other flags, so this flag must be set for other flags to effect the long double`
			`behavior.`

[libc][docs] Printf behavior doc In the document on undefined behavior, I noted that writing down your decisions is very important. This document contains all the information for compile flags and undefined behavior for our printf. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D158311 2023-08-18 13:41:41 -07:00			`LIBC_COPT_FLOAT_TO_STR_USE_MEGA_LONG_DOUBLE_TABLE`
			`-------------------------------------------------`
			`When set, the float to string decimal conversion algorithm will use a larger`
[libc][NFC] Remove all trailing spaces from libc (#82831) Summary: There are a lot of random training spaces on various lines. This patch just got rid of all of them with `sed 's/\ \+$//g'. 2024-02-23 16:34:00 -06:00			`table to accelerate long double conversions. This larger table is around 5MB of`
[libc] Move printf long double to simple calc (#75414) The Ryu algorithm is very fast with its table, but that table grows too large for long doubles. This patch adds a method of calculating the digits of long doubles using just wide integers and fast modulo operations. This results in significant performance improvements vs the previous int calc mode, while taking up a similar amound of peak memory. It will be slow in some %e/%g cases, but reasonable fast for %f with no loss of accuracy. 2024-01-25 09:35:40 -08:00			`size when compiled.`
[libc][docs] Printf behavior doc In the document on undefined behavior, I noted that writing down your decisions is very important. This document contains all the information for compile flags and undefined behavior for our printf. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D158311 2023-08-18 13:41:41 -07:00
[libc] Move printf long double to simple calc (#75414) The Ryu algorithm is very fast with its table, but that table grows too large for long doubles. This patch adds a method of calculating the digits of long doubles using just wide integers and fast modulo operations. This results in significant performance improvements vs the previous int calc mode, while taking up a similar amound of peak memory. It will be slow in some %e/%g cases, but reasonable fast for %f with no loss of accuracy. 2024-01-25 09:35:40 -08:00			`LIBC_COPT_FLOAT_TO_STR_USE_DYADIC_FLOAT`
			`---------------------------------------`
[libc][docs] Printf behavior doc In the document on undefined behavior, I noted that writing down your decisions is very important. This document contains all the information for compile flags and undefined behavior for our printf. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D158311 2023-08-18 13:41:41 -07:00			`When set, the float to string decimal conversion algorithm will use dyadic`
			`floats instead of a table when performing floating point conversions. This`
			`results in ~50 digits of accuracy in the result, then zeroes for the remaining`
			`values. This may improve performance but may also cause some tests to fail. The`
			`flag ending in _LD is the same, but only applies to long double decimal`
			`conversions.`

			`LIBC_COPT_FLOAT_TO_STR_USE_INT_CALC`
			`-----------------------------------`
			`When set, the float to string decimal conversion algorithm will use wide`
			`integers instead of a table when performing floating point conversions. This`
			`gives the same results as the table, but is very slow at the extreme ends of`
[libc] Move printf long double to simple calc (#75414) The Ryu algorithm is very fast with its table, but that table grows too large for long doubles. This patch adds a method of calculating the digits of long doubles using just wide integers and fast modulo operations. This results in significant performance improvements vs the previous int calc mode, while taking up a similar amound of peak memory. It will be slow in some %e/%g cases, but reasonable fast for %f with no loss of accuracy. 2024-01-25 09:35:40 -08:00			`the long double range.`
[libc][docs] Printf behavior doc In the document on undefined behavior, I noted that writing down your decisions is very important. This document contains all the information for compile flags and undefined behavior for our printf. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D158311 2023-08-18 13:41:41 -07:00
			`LIBC_COPT_FLOAT_TO_STR_NO_TABLE`
			`-------------------------------`
			`When set, the float to string decimal conversion algorithm will not use either`
			`the mega table or the normal table for any conversions. Instead it will set`
			`algorithmic constants to improve performance when using calculation algorithms.`
			`If this flag is set without any calculation algorithm flag set, an error will`
			`occur.`

			`--------`
			`Parsing:`
			`--------`

			`When printf encounters an invalid conversion specification, the entire`
			`conversion specification will be passed literally to the output string.`
			`As an example, printf("%Z") would display "%Z".`

			`If an index mode conversion is requested for index "n" and there exists a number`
			`in [1,n) that does not have a conversion specified in the format string, then`
			`the conversion for index "n" is considered invalid.`

			`If a non-index mode (also referred to as sequential mode) conversion is`
			`specified after an index mode conversion, the next argument will be read but the`
			`current index will not be incremented. From this point on, the arguments`
			`selected by each conversion may or may not be correct. This is considered`
			`dangerously undefined and may change without warning.`

			`If a conversion specification is provided an invalid type modifier, that type`
			`modifier will be ignored, and the default type for that conversion will be used.`
			`In the case of the length modifier "L" and integer conversions, it will be`
			`treated as if it was "ll" (lowercase LL). For this purpose the list of integer`
			`conversions is d, i, u, o, x, X, n.`

			`If a conversion specification ending in % has any options that consume arguments`
			`(e.g. "%.%") those arguments will be consumed as normal, but their values will`
			`be ignored.`

			`If a conversion specification ends in a null byte ('\0') then it shall be`
			`treated as an invalid conversion followed by a null byte.`

			`If a number passed as a min width or precision value is out of range for an int,`
			`then it will be treated as the largest or smallest value in the int range`
			`(e.g. "%-999999999999.999999999999s" is the same as "%-2147483648.2147483647s").`

Add bit width length modifier to printf (#82461) Resolves #81685. This adds support for wN and wfN length modifiers in fprintf. 2024-03-29 10:15:22 -07:00			`If a number passed as a bit width is less than or equal to zero, the conversion`
			`is considered invalid. If the provided bit width is larger than the width of`
			`uintmax_t, it will be clamped to the width of uintmax_t.`

[libc][docs] Printf behavior doc In the document on undefined behavior, I noted that writing down your decisions is very important. This document contains all the information for compile flags and undefined behavior for our printf. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D158311 2023-08-18 13:41:41 -07:00			`----------`
			`Conversion`
			`----------`
			`Any conversion specification that contains a flag or option that it does not`
			`have defined behavior for will ignore that flag or option (e.g. %.5c is the same`
			`as %c).`

			`If a conversion specification ends in %, then it will be treated as if it is`
			`"%%", ignoring all options.`

			`If a null pointer is passed to a %s conversion specification and null pointer`
			`checks are enabled, it will be treated as if the provided string is "null".`

			`If a null pointer is passed to a %n conversion specification and null pointer`
			`checks are enabled, the conversion will fail and printf will return a negative`
			`value.`

			`If a null pointer is passed to a %p conversion specification, the string`
			`"(nullptr)" will be returned instead of an integer value.`

			`The %p conversion will display any non-null pointer as if it was a uintptr value`
			`passed to a "%#tx" conversion, with all other options remaining the same as the`
			`original conversion.`

			`The %p conversion will display a null pointer as if it was the string`
			`"(nullptr)" passed to a "%s" conversion, with all other options remaining the`
			`same as the original conversion.`
[libc] Add fixed point support to printf (#82707) This patch adds the r, R, k, and K conversion specifiers to printf, with accompanying tests. They are guarded behind the LIBC_COPT_PRINTF_DISABLE_FIXED_POINT flag as well as automatic fixed point support detection. 2024-02-27 11:03:20 -08:00
			`The %r, %R, %k, and %K fixed point number format specifiers are accepted as`
			`defined in ISO/IEC TR 18037 (the fixed point number extension). These are`
			`available when the compiler is detected as having support for fixed point`
			`numbers and the LIBC_COPT_PRINTF_DISABLE_FIXED_POINT flag is not set.`