In `clang-scan-deps`, we're creating lots of `Module` instances.
Allocating them all in a bump-pointer allocator reduces the number of
retired instructions by 1-1.5% on my workload.
This patch adds an IsText parameter to the following getBufferForFile,
getBufferForFileImpl. We introduce a new virtual function
openFileForReadBinary which defaults to openFileForRead except in
RealFileSystem which uses the OF_None flag instead of OF_Text.
The default is set to OF_Text instead of OF_None, this change in value
does not affect any other platforms other than z/OS. Setting this
parameter correctly is required to open files on z/OS in the correct
encoding. The IsText parameter is based on the context of where we open
files, for example, in the ASTReader, HeaderMap requires that files
always be opened in binary even though they might be tagged as text.
`__has_builtin` was relying on reversible identifiers and string
matching to recognize builtin-type traits, leading to some newer type
traits not being recognized.
Fixes#111477
Recently, Solaris bootstrap got broken because Solaris uses a
non-standard mangling of `std::tm` and a few others. This was fixed with
a hack in PR #100724. The Solaris ABI requires mangling `std::tm` as
`tm` and similarly for `std::div_t`, `std::ldiv_t`, and `std::lconv`,
which is what this patch implements. The hack needs to stay in place to
allow building with older versions of `clang`.
Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11` (2-stage
builds with both `clang-19` and `gcc-14` as build compiler), and
`x86_64-pc-linux-gnu`.
Some `FileManager` APIs still return `{File,Directory}Entry` instead of
the preferred `{File,Directory}EntryRef`. These are documented to be
deprecated, but don't have the attribute that warns on their usage. This
PR marks them as such with `LLVM_DEPRECATED()` and replaces their usage
with the recommended counterparts. NFCI.
This implements the logic of the `common_type` base template as a
builtin alias. If there should be no `type` member, an empty class is
returned. Otherwise a specialization of a `type_identity`-like class is
returned. The base template (i.e. `std::common_type`) as well as the
empty class and `type_identity`-like struct are given as arguments to
the builtin.
OpenCL has a reserved operator (^^), the use of which was diagnosed as
an error (735c6cdebdcd4292928079cb18a90f0dd5cd65fb). However, OpenCL
also encourages working with the blocks language extension. This token
has a parsing ambiguity as a result. Consider:
unsigned x=0;
unsigned y=x^^{return 0;}();
This should result in y holding the value zero (0^0) through an
immediately invoked block call as the right-hand side of the xor
operator. However, it causes errors instead because of this reserved
token: https://godbolt.org/z/navf7jTv1
This token is still reserved in OpenCL 3.0, so we still wish to issue a
diagnostic for its use. However, we do not need to create a token for an
extension point that's been unused for about a decade. So this patch
moves the diagnostic from a parsing diagnostic to a lexing diagnostic
and no longer forms a single token. The diagnostic behavior is slightly
worse as a result, but still seems acceptable.
Part of the reason this is coming up is because WG21 is considering
using ^^ as a token for reflection, so this token may come back in the
future.
This patch stops adjustments of the module cache path beyond what is
done in `ParseHeaderSearchArgs` (making it absolute and removing dots).
This enables more efficient implementation of the caching VFS in
https://github.com/llvm/llvm-project/pull/88800.
The utilities we use for lexing string and character literals can be run
in a mode where we pass a null pointer for the diagnostics engine. This
mode is used by the format string checkers, for example. However, there
were two places that failed to account for a null diagnostic engine
pointer: `\o{}` and `\x{}`.
This patch adds a check for a null pointer and correctly handles
fallback behavior.
Fixes#102218
Consider the following:
```
template<typename T>
struct A { };
template<typename T>
int A<T>::B::* f(); // error: no member named 'B' in 'A<T>'
```
Although this is clearly valid, clang rejects it because the
_nested-name-specifier_ `A<T>::` is parsed as-if it was declarative,
meaning, we parse it as-if it was the _nested-name-specifier_ in a
redeclaration/specialization. However, we don't (and can't) know whether
the _nested-name-specifier_ is declarative until we see the '`*`' token,
but at that point we have already complained that `A` has no member
named `B`! This patch addresses this bug by adding support for _fully_
unannotated _and_ unbounded tentative parsing, which allows for us to
parse past tokens without having to cache them until we reach a point
where we can guarantee to be past the construct we are disambiguating.
I don't know where the approach taken here is ideal -- alternatives are
welcome. However, the performance impact (as measured by
llvm-compile-time-tracker (https://llvm-compile-time-tracker.com/?config=Overview&stat=instructions%3Au&remote=sdkrystian)
is quite minimal (0.09%, which I plan to further improve).
The introduction of `std::put_time` in
fad17b43dbc09ac7e0a95535459845f72c2b739a broke the Solaris build:
```
Undefined first referenced
symbol in file
_ZNKSt8time_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE3putES3_RSt8ios_basecPKSt2tmPKcSB_ lib/libclangLex.a(PPMacroExpansion.cpp.o)
ld: fatal: symbol referencing errors
```
As discussed in PR #99075, the problem is that GCC mangles `std::tm` as
`tm` on Solaris for binary compatility, while `clang` doesn't (Issue
#33114).
As a stop-gap measure, this patch introduces an `asm` level alias to the
same effect.
Tested on `sparcv9-sun-solaris2.11`, `amd64-pc-solaris2.11`, and
`x86_64-pc-linux-gnu`.
In `clang/lib/Lex/PPMacroExpansion.cpp`, replaced the usage of the
obsolete `asctime` function with `std::put_time` for generating
timestamp strings.
Fixes: https://github.com/llvm/llvm-project/issues/98724
Such searches can be costly and non-intuitive. We've seen complaints
from developers that they don't expect clang to find modules on their
own and not in search paths that developers provide. Keeping the search
of modulemaps in subdirectories for code completion as it provides
better user experience.
If you are defining module "UsefulCode" in
"include/UnrelatedName/module.modulemap", it is recommended to rename
the directory "UnrelatedName" to "UsefulCode". If you cannot do so, you
can add to "include/module.modulemap" a line like `extern module
UsefulCode "UnrelatedName/module.modulemap"`, so clang can find module
"UsefulCode" without checking each subdirectory in "include/".
rdar://106677321
---------
Co-authored-by: Jan Svoboda <jan@svoboda.ai>
Follow-up to `34ab855826b8cb0c3b46c770b83390bd1fe95c64`:
* Don't fail the scan with an include with missing filename, it may be
inside a skipped preprocessor block. Let the compilation provide any
related error.
* Fix an issue where the lexer was skipping through the next directive,
after ignoring the include with missing filename.
Previously source input like `#import ` resulted in infinite calls
append the same token into `CurDirTokens`. This patch now ignores those
directive lines if they won't actually end up being compiled. (e.g.
macro guarded)
resolves: rdar://121247565
This PR implement [P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1), and refactor the convoluted state
machines in module name lexical analysis.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
1. Dead code in `LookupEmbedFile`. The loop always exited on the first
iteration. This was also causing a bug of not checking all directories
provided by `--embed-dir`.
2. Use of uninitialized variable `CurTok` in `LexEmbedParameters`. It
was used to initialize the field which seems to be unused. Removed
unused field, this way `CurTok` should be initialized by Lex method.
`import:(type)name` is a method argument decl in ObjC, but the C++20
preprocessing rules say this is a preprocessing line.
Because the dependency directive scanner is not language dependent, this
patch extends the C++20 rule to exclude `module :` (which is never a
valid module decl anyway), and `import :` that is not followed by an
identifier.
This is ok to do because in C++20 mode the compiler will later error on
lines like this anyway, and the dependencies the scanner returns are
still correct.
This enables raw R"" string literals in C in some language modes
and adds an option to disable or enable them explicitly as an
extension.
Background: GCC supports raw string literals in C in `-gnuXY` modes
starting with gnu99. This pr both enables raw string literals in gnu99
mode and later in C and adds an `-f[no-]raw-string-literals` flag to override
this behaviour. The decision not to enable raw string literals in gnu89
mode, according to the GCC devs, is intentional as that mode is supposed
to be used for ‘old code’ that they don’t want to break; we’ve decided to
match GCC’s behaviour here as well.
The `-fraw-string-literals` flag can additionally be used to enable raw string
literals in modes where they aren’t enabled by default (such as c99—as
opposed to gnu99—or even e.g. C++03); conversely, the negated flag can
be used to disable them in any gnuXY modes that *do* provide them by
default, or to override a previous flag. However, we do *not* support
disabling raw string literals (or indeed either of these two options) in
C++11 mode and later, because we don’t want to just start supporting
disabling features that are actually part of the language in the general case.
This fixes#85703.
This commit implements the entirety of the now-accepted [N3017
-Preprocessor
Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and
its sister C++ paper [p1967](https://wg21.link/p1967). It implements
everything in the specification, and includes an implementation that
drastically improves the time it takes to embed data in specific
scenarios (the initialization of character type arrays). The mechanisms
used to do this are used under the "as-if" rule, and in general when the
system cannot detect it is initializing an array object in a variable
declaration, will generate EmbedExpr AST node which will be expanded by
AST consumers (CodeGen or constant expression evaluators) or expand
embed directive as a comma expression.
This reverts commit
682d461d5a.
---------
Co-authored-by: The Phantom Derpstorm <phdofthehouse@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
This patch is helpful to reduce 32 bits for HeaderFileInfo by combining
a uint32_t and pointer into a tagged pointer.
This is reviewed as part of
https://github.com/llvm/llvm-project/pull/92085 and required to be split
as a separate commit
HeaderSearch::MarkFileModuleHeader is no longer properly checking for
no-changes, and so sets the HeaderFileInfo for every `textual header` to
non-external.
The commit adds serialization and de-serialization implementations for
the stored regions. Basically, the serialized representation of the
regions of a PP is a (ordered) sequence of source location encodings.
For de-serialization, regions from loaded files are stored by their ASTs.
When later one queries if a loaded location L is in an opt-out
region, PP looks up the regions of the loaded AST where L is at.
(Background if helps: a pair of `#pragma clang unsafe_buffer_usage begin/end` pragmas marks a
warning-opt-out region. The begin and end locations (opt-out regions)
are stored in preprocessor instances (PP) and will be queried by the
`-Wunsafe-buffer-usage` analyzer.)
The reported issue at upstream: https://github.com/llvm/llvm-project/issues/90501
rdar://124035402
This commit implements the entirety of the now-accepted [N3017 -
Preprocessor
Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and
its sister C++ paper [p1967](https://wg21.link/p1967). It implements
everything in the specification, and includes an implementation that
drastically improves the time it takes to embed data in specific
scenarios (the initialization of character type arrays). The mechanisms
used to do this are used under the "as-if" rule, and in general when the
system cannot detect it is initializing an array object in a variable
declaration, will generate EmbedExpr AST node which will be expanded
by AST consumers (CodeGen or constant expression evaluators) or
expand embed directive as a comma expression.
---------
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
Co-authored-by: Podchishchaeva, Mariya <mariya.podchishchaeva@intel.com>
This commit fixes https://github.com/llvm/llvm-project/issues/88896 by
passing LangOpts from the CompilerInstance to
DependencyScanningWorker so that the original LangOpts are
preserved/respected.
This makes for more accurate parsing/lexing when certain language
versions or features specific to versions are to be used.
Conversion of floating-point literal to binary representation must be
made using constant rounding mode, which can be changed using pragma
FENV_ROUND. For example, the literal "0.1F" should be representes by
either 0.099999994 or 0.100000001 depending on the rounding direction.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
24 under clang/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
`Module::Requirement` was defined as a `std::pair<std::string, bool>`.
This required a comment to explain what the data members mean and makes
the usage harder to understand. Replace this with a struct with two
members, `FeatureName` and `RequiredState`.
---------
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
When writing out a PCM, we skip serializing headers' `HeaderFileInfo`
struct whenever this condition evaluates to `true`:
```c++
!HFI || (HFI->isModuleHeader && !HFI->isCompilingModuleHeader)
```
However, when Clang parses a module map file, each textual header gets a
`HFI` with `isModuleHeader=false`, `isTextualModuleHeader=true` and
`isCompilingModuleHeader=false`. This means the condition evaluates to
`false` even if the header was never included and the module map did not
affect the compilation. Each PCM file that happened to parse such module
map then contains a copy of the `HeaderFileInfo` struct for all textual
headers, and considers the containing module map affecting.
This patch makes it so that we skip headers that have not been included,
essentially removing the virality of textual headers when it comes to
PCM serialization.
This exposes _BitInt literal suffixes __wb and u__wb as an extension
in C++. There is a new Extension warning, and the tests are
essentially the same as the existing _BitInt literal tests for C but
with a few additional cases.
Fixes#85223
Clang uses the `HeaderFileInfo` struct to track bits of information on
header files, which gets used throughout the compiler. We also use this
to compute the set of affecting module maps in `ASTWriter` and in the
end serialize the information into the `HEADER_SEARCH_TABLE` record of a
PCM file, allowing clients to learn about headers from the module. In
doing so, Clang asks for existing `HeaderFileInfo` for all known
`FileEntries`. Note that this asks the loaded PCM files for the
information they have on each header file in question. This seems
unnecessary: we only want to serialize information on header files that
either belong to the current module or that got included textually.
Loaded PCM files can't provide us with any useful information.
For explicit modules with lazy loading (using `-fmodule-map-file=<path>`
with `-fmodule-file=<name>=<path>`) the compiler knows about header
files listed in the module map files on the command-line. This can be a
large number.
Asking for existing `HeaderFileInfo` can trigger deserialization of
`HEADER_SEARCH_TABLE` from loaded PCM files. Keys of the on-disk hash
table consist of the header file size and modification time. However,
with explicit modules Clang zeroes out the modification time. Moreover,
if you import lots of modules, some of their header files end up having
identical sizes. This means lots of hash collisions that can only be
resolved by running the serialized filename through `FileManager` and
comparing equality of the `FileEntry`. This ends up being super
expensive, essentially re-stating lots of the transitively loaded SDK
header files.
This patch cleans up the API for getting `HeaderFileInfo` and makes sure
`ASTWriter` uses the version that doesn't ask loaded PCM files for more
information. This removes the excessive stat traffic coming from
`ASTWriter` hopefully without changing observable behavior.