Summary:
Before this change clang produced output with header unit names that may
conaint path separators, dots and other non-identifier characters. This
diff prints header unit name in quotes and -E output can be compiled
again. Also remove unnecessary space between header unit name and semi.
Test Plan: check-clang
This PR implement [P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1), and refactor the convoluted state
machines in module name lexical analysis.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
This commit implements the entirety of the now-accepted [N3017
-Preprocessor
Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and
its sister C++ paper [p1967](https://wg21.link/p1967). It implements
everything in the specification, and includes an implementation that
drastically improves the time it takes to embed data in specific
scenarios (the initialization of character type arrays). The mechanisms
used to do this are used under the "as-if" rule, and in general when the
system cannot detect it is initializing an array object in a variable
declaration, will generate EmbedExpr AST node which will be expanded by
AST consumers (CodeGen or constant expression evaluators) or expand
embed directive as a comma expression.
This reverts commit
682d461d5a.
---------
Co-authored-by: The Phantom Derpstorm <phdofthehouse@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
This commit implements the entirety of the now-accepted [N3017 -
Preprocessor
Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and
its sister C++ paper [p1967](https://wg21.link/p1967). It implements
everything in the specification, and includes an implementation that
drastically improves the time it takes to embed data in specific
scenarios (the initialization of character type arrays). The mechanisms
used to do this are used under the "as-if" rule, and in general when the
system cannot detect it is initializing an array object in a variable
declaration, will generate EmbedExpr AST node which will be expanded
by AST consumers (CodeGen or constant expression evaluators) or
expand embed directive as a comma expression.
---------
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
Co-authored-by: Podchishchaeva, Mariya <mariya.podchishchaeva@intel.com>
This patch provides more information to the
`PPCallbacks::InclusionDirective()` hook. We now always pass the
suggested module, regardless of whether it was actually imported or not.
The extra `bool ModuleImported` parameter then denotes whether the
header `#include` will be automatically translated into import the the
module.
The main change is in `clang/lib/Lex/PPDirectives.cpp`, where we take
care to not modify `SuggestedModule` after it's been populated by
`LookupHeaderIncludeOrImport()`. We now exclusively use the `SM`
(`ModuleToImport`) variable instead, which has been equivalent to
`SuggestedModule` until now. This allows us to use the original
non-modified `SuggestedModule` for the callback itself.
(This patch turns out to be necessary for
https://github.com/apple/llvm-project/pull/8011).
This option will cause -E to preserve the #include directives
for system headers, rather than expanding them into the output.
This can greatly reduce the volume of preprocessed source text
in a test case, making test case reduction simpler.
Note that -fkeep-system-includes is not always appropriate. For
example, if the problem you want to reproduce is induced by a
system header file, it's better to expand those headers fully.
If your source defines symbols that influence the content of a
system header (e.g., _POSIX_SOURCE) then -E will eliminate the
definition, potentially changing the meaning of the preprocessed
source. If you use -isystem to point to non-system headers, for
example to suppress warnings in third-party software, those will
not be expanded and might make the preprocessed source less useful
as a test case.
This patch is the first part of the below RFC:
https://discourse.llvm.org/t/rfc-handle-execution-results-in-clang-repl/68493
It adds an annotation token which will replace the original EOF token
when we are in the incremental C++ mode. In addition, when we're
parsing an ExprStmt and there's a missing semicolon after the
expression, we set a marker in the annotation token and continue
parsing.
Eventually, we propogate this info in ParseTopLevelStmtDecl and are able
to mark this Decl as something we want to do value printing. Below is a
example:
clang-repl> int x = 42;
clang-repl> x
// `x` is a TopLevelStmtDecl and without a semicolon, we should set
// it's IsSemiMissing bit so we can do something interesting in
// ASTConsumer::HandleTopLevelDecl.
The idea about annotation toke is proposed by Richard Smith, thanks!
Signed-off-by: Jun Zhang <jun@junz.org>
Differential Revision: https://reviews.llvm.org/D148997
The needed tweaks are mostly trivial, the one nasty bit is Clang's usage
of OptionalStorage. To keep this working old Optional stays around as
clang::CustomizableOptional, with the default Storage removed.
Optional<File/DirectoryEntryRef> is replaced with a typedef.
I tested this with GCC 7.5, the oldest supported GCC I had around.
Differential Revision: https://reviews.llvm.org/D140332
This reverts commit 8f0df9f3bbc6d7f3d5cbfd955c5ee4404c53a75d.
The Optional*RefDegradesTo*EntryPtr types want to keep the same size as
the underlying type, which std::optional doesn't guarantee. For use with
llvm::Optional, they define their own storage class, and there is no way
to do that in std::optional.
On top of that, that commit broke builds with older GCCs, where
std::optional was not trivially copyable (static_assert in the clang
sources was failing).
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.
Change call sites to use `std::size` instead. Leave the few call sites that
use a locally defined `array_lengthof` that are meant to test previous bugs
with NTTPs in clang analyzer and SemaTemplate.
Differential Revision: https://reviews.llvm.org/D133520
This patch changes type of the `File` parameter in `PPCallbacks::InclusionDirective()` from `const FileEntry *` to `Optional<FileEntryRef>`.
With the API change in place, this patch then removes some uses of the deprecated `FileEntry::getName()` (e.g. in `DependencyGraph.cpp` and `ModuleDependencyCollector.cpp`).
Reviewed By: dexonsmith, bnbarham
Differential Revision: https://reviews.llvm.org/D123574
When the -fdirectives-only option is used together with -E, the preprocessor
output reflects evaluation of if/then/else directives.
As such, it preserves defines and undefs of macros that are still live after
such processing. The intent is that this output could be consumed as input
to generate considered a C++20 header unit.
We strip out any (unused) defines that come from built-in, built-in-file or
command line; these are re-added when the preprocessed source is consumed.
Differential Revision: https://reviews.llvm.org/D121099
The lexer can attempt to lex a _Pragma and crash with an out of bounds string access when it's
lexing a _Pragma whose string token is an invalid buffer, e.g. when a module header file from which the macro
expansion for that token was deleted from the file system.
Differential Revision: https://reviews.llvm.org/D116052
This reverts commit ef8206320769ad31422a803a0d6de6077fd231d2.
- It conflicts with the existing llvm::size in STLExtras, which will now
never be called.
- Calling it without llvm:: breaks C++17 compat
The PragmaAssumeNonNullHandler (and maybe others) passes an invalid
SourceLocation to its callback, hence PrintPreprocessedOutput does not
know how many lines to insert between the previous token and the
pragma and does nothing.
With this patch we instead assume that the unknown token is on the same
line as the previous such that we can call the procedure that also emits
semantically significant whitespace.
Fixes bug reported here: https://reviews.llvm.org/D104601#3105044
This is reported by msvc as
warning C6287: redundant code: the left and right subexpressions are identical
EmittedDirectiveOnThisLine implies EmittedTokensOnThisLine
making this an NFC change. To be on the safe side and because both of
them are checked at other places as well, we continue to check both.
Compiler warning reported here:
https://reviews.llvm.org/D104601#2957333
In -P mode, PrintPPOutputPPCallbacks::MoveToLine started at least one
newline if current and target line number mismatched. The method is also
called when entering a new file, be it the main file or an include file.
In this situation line numbers always almost mismatch, resulting in a
newline for each occurance even if no tokens have been printed
in-between.
Empty lines at the beginning of the output must be trimmed because it
may be parsed by scripts expecting the result to appear on the first
output line, as done by LibreOffice's configure script.
Fix by only emitting a newline if tokens have been printed so far using
the EmittedTokensOnThisLine flag. Also adding a test case of FileChanged
callbacks occuring with empty include files.
This fixes llvm.org/PR51616
The -fms-extensions converts __pragma (and _Pragma) into a #pragma that
has to occur at the beginning of a line and end with a newline. This
patch ensures that the newline after the #pragma is added even if
Token::isAtStartOfLine() indicated that we should not start a newline.
Committing relying post-commit review since the change is small, some
downstream uses might be blocked without this fix, and to make clear the
decision of the new -fminimize-whitespace feature (fix on main, revert
on clang-13.x branch) suggested by @aaron.ballman in D104601.
Differential Revision: https://reviews.llvm.org/D107183
The implementation of -fminimize-whitespace (D104601) revised the logic
when to emit newlines. There was no case to handle when more than
8 lines were skippped in -P (DisableLineMarkers) mode and instead fell
through the case intended for -fminimize-whitespace, i.e. emit nothing.
This patch will emit one newline in this case.
The newline logic is slightly reorganized. The `-P -fminimize-whitespace`
case is handled explicitly and emitting at least one newline is the new
fallback case. The choice between emitting a line marker or up to
7 empty lines is now a choice only with enabled line markers. The up to
8 newlines likely are fewer characters than a line directive, but
in -P mode this had the paradoxic effect that it would print up to
7 empty lines, but none at all if more than 8 lines had to be skipped.
Now with DisableLineMarkers, we don't consider printing empty lines
(just start a new line) which matches gcc's behavior.
The line-directive-output-mincol.c test is replaced with a more
comprehensive test skip-empty-lines.c also testing the more than
8 skipped lines behaviour with all flag combinations.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D106924
This patch adds the -fminimize-whitespace with the following effects:
* If combined with -E, remove as much non-line-breaking whitespace as
possible.
* If combined with -E -P, removes as much whitespace as possible,
including line-breaks.
The motivation is to reduce the amount of insignificant changes in the
preprocessed output with source files where only whitespace has been
changed (add/remove comments, clang-format, etc.) which is in particular
useful with ccache.
A patch for ccache for using this flag has been proposed to ccache as well:
https://github.com/ccache/ccache/pull/815, which will use
-fnormalize-whitespace when clang-13 has been detected, and additionally
uses -P in "unify_mode". ccache already had a unify_mode in an older
version which was removed because of problems that using the
preprocessor itself does not have (such that the custom tokenizer did
not recognize C++11 raw strings).
This patch slightly reorganizes which part is responsible for adding
newlines that are required for semantics. It is now either
startNewLineIfNeeded() or MoveToLine() but never both; this avoids the
ShouldUpdateCurrentLine workaround and avoids redundant lines being
inserted in some cases. It also fixes a mandatory newline not inserted
after a _Pragma("...") that is expanded into a #pragma.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D104601
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
Differential revision: https://reviews.llvm.org/D66259
llvm-svn: 368942
Currently, a pragma AST node's recorded location starts at the
namespace token (such as `omp` in the case of OpenMP) after the
`#pragma` token, and the `#pragma` location isn't available. However,
the `#pragma` location can be useful when, for example, rewriting a
directive using Clang's Rewrite facility.
This patch makes `#pragma` locations available in any `PragmaHandler`
but it doesn't yet make use of them.
This patch also uses the new `struct PragmaIntroducer` to simplify
`Preprocessor::HandlePragmaDirective`. It doesn't do the same for
`PPCallbacks::PragmaDirective` because that changes the API documented
in `clang-tools-extra/docs/pp-trace.rst`, and I'm not sure about
backward compatibility guarantees there.
Reviewed By: ABataev, lebedev.ri, aaron.ballman
Differential Revision: https://reviews.llvm.org/D61643
llvm-svn: 361335
Summary:
By adding a hook to consume all tokens produced by the preprocessor.
The intention of this change is to make it possible to consume the
expanded tokens without re-runnig the preprocessor with minimal changes
to the preprocessor and minimal performance penalty when preprocessing
without recording the tokens.
The added hook is very low-level and reconstructing the expanded token
stream requires more work in the client code, the actual algorithm to
collect the tokens using this hook can be found in the follow-up change.
Reviewers: rsmith
Reviewed By: rsmith
Subscribers: eraman, nemanjai, kbarton, jsji, riccibruno, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59885
llvm-svn: 361007
__pragma(execution_character_set(push, "UTF-8")) is used in
TraceLoggingProvider.h. This commit implements a no-op handler for
compatability, similar to how the flag -fexec_charset is handled.
Patch by Matt Gardner!
Differential Revision: https://reviews.llvm.org/D58530
llvm-svn: 356185
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Previously these would be transformed into annotation tokens and the
preprocessor would then assume they were real tokens with source
locations and assert/UB.
Other pragmas that produce annotation tokens aren't a problem because
they aren't handled if the parser isn't hooked up - ParsePragma.cpp
registers those handlers & isn't run for pure preprocessing. So they're
treated as unknown pragmas & printed verbatim by the preprocessor.
Perhaps these pragmas should be treated the same way? But they got mixed
in with other __debug pragmas that do need to be handled during
preprocessing.
The third __debug pragma that produces an annotation token is 'captured'
- which had its own fix for this issue - by not inserting the annotation
token in the first place if it detected that it was in preprocessing
mode. I've removed that fix (from Lex/Pragma.cpp) in favor of the more
general one in Frontend/PrintPreprocessedOutput.cpp.
llvm-svn: 346928
This commit relands r331904.
Adding a SrcMgr::CharacteristicKind parameter to the InclusionDirective
in PPCallbacks, and updating calls to that function. This will be useful
in https://reviews.llvm.org/D43778 to determine which includes are
system
headers.
Differential Revision: https://reviews.llvm.org/D46614
llvm-svn: 332021
Adding a SrcMgr::CharacteristicKind parameter to the InclusionDirective
in PPCallbacks, and updating calls to that function. This will be useful
in https://reviews.llvm.org/D43778 to determine which includes are system
headers.
Differential Revision: https://reviews.llvm.org/D46614
llvm-svn: 331904
This comes up when pre-processing standalone .s files containing
hash-prefixed comments. The pre-processor should skip the unknown
directive and not emit an extra newline as we were doing.
Fixes PR34950
llvm-svn: 315953
- Extracted the reading of the tokens out into a separate function.
- Replace 'Argument' with 'Parameter' when referring to the identifiers of the macro definition (as opposed to the supplied arguments - MacroArgs - during the macro invocation).
This is in preparation for submitting patches for review to implement __VA_OPT__ which will otherwise just keep lengthening the HandleDefineDirective function and making it less comprehensible.
I will also directly update some extra clang tooling that is broken by the change from Argument to Parameter.
Hopefully the bots will stay appeased.
Thanks!
llvm-svn: 308190
- Extracted the reading of the tokens out into a separate function.
- Replace 'Argument' with 'Parameter' when referring to the identifiers of the macro definition (as opposed to the supplied arguments - MacroArgs - during the macro invocation).
This is in preparation for submitting patches for review to implement __VA_OPT__ which will otherwise just keep lengthening the HandleDefineDirective function and making it less comprehensible.
Thanks!
llvm-svn: 308157
These pragmas are intended to simulate the effect of entering or leaving a file
with an associated module. This is not completely implemented yet: declarations
between the pragmas will not be attributed to the correct module, but macro
visibility is already functional.
Modules named by #pragma clang module begin must already be known to clang (in
some module map that's either loaded or on the search path).
llvm-svn: 302098
Many of our supported configurations support modules but do not have any
first-class syntax to perform a module import. This leaves us with a problem:
there is no way to represent the expansion of a #include that imports a module
in the -E output for such languages. (We don't want to just leave it as a
#include because that requires the consumer of the preprocessed source to have
the same file system layout and include paths as the creator.)
This patch adds a new pragma:
#pragma clang module import MODULE.NAME.HERE
that imports a module, and changes -E and -frewrite-includes to use it when
rewriting a #include that maps to a module import. We don't make any attempt
to use a native language syntax import if one exists, to get more consistent
output. (If in the future, @import and #include have different semantics in
some way, the pragma will track the #include semantics.)
llvm-svn: 301725