llvm-project/clang/lib/Analysis/MacroExpansionContext.cpp

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

232 lines
8.1 KiB
C++
Raw Normal View History

[analyzer] Introduce MacroExpansionContext to libAnalysis Introduce `MacroExpansionContext` to track what and how macros in a translation unit expand. This is the first element of the patch-stack in this direction. The main goal is to substitute the current macro expansion generator in the `PlistsDiagnostics`, but all the other `DiagnosticsConsumer` could benefit from this. `getExpandedText` and `getOriginalText` are the primary functions of this class. The former can provide you the text that was the result of the macro expansion chain starting from a `SourceLocation`. While the latter will tell you **what text** was in the original source code replaced by the macro expansion chain from that location. Here is an example: void bar(); #define retArg(x) x #define retArgUnclosed retArg(bar() #define BB CC #define applyInt BB(int) #define CC(x) retArgUnclosed void unbalancedMacros() { applyInt ); //^~~~~~~~~~^ is the substituted range // Original text is "applyInt )" // Expanded text is "bar()" } #define expandArgUnclosedCommaExpr(x) (x, bar(), 1 #define f expandArgUnclosedCommaExpr void unbalancedMacros2() { int x = f(f(1)) )); // Look at the parenthesis! // ^~~~~~^ is the substituted range // Original text is "f(f(1))" // Expanded text is "((1,bar(),1,bar(),1" } Might worth investigating how to provide a reusable component, which could be used for example by a standalone tool eg. expanding all macros to their definitions. I borrowed the main idea from the `PrintPreprocessedOutput.cpp` Frontend component, providing a `PPCallbacks` instance hooking the preprocessor events. I'm using that for calculating the source range where tokens will be expanded to. I'm also using the `Preprocessor`'s `OnToken` callback, via the `Preprocessor::setTokenWatcher` to reconstruct the expanded text. Unfortunately, I concatenate the token's string representation without any whitespaces except if the token is an identifier when I emit an extra space to produce valid code for `int var` token sequences. This could be improved later if needed. Patch-stack: 1) D93222 (this one) Introduces the MacroExpansionContext class and unittests 2) D93223 Create MacroExpansionContext member in AnalysisConsumer and pass down to the diagnostics consumers 3) D93224 Use the MacroExpansionContext for macro expansions in plists It replaces the 'old' macro expansion mechanism. 4) D94673 API for CTU macro expansions You should be able to get a `MacroExpansionContext` for each imported TU. Right now it will just return `llvm::None` as this is not implemented yet. 5) FIXME: Implement macro expansion tracking for imported TUs as well. It would also relieve us from bugs like: - [fixed] D86135 - [confirmed] The `__VA_ARGS__` and other macro nitty-gritty, such as how to stringify macro parameters, where to put or swallow commas, etc. are not handled correctly. - [confirmed] Unbalanced parenthesis are not well handled - resulting in incorrect expansions or even crashes. - [confirmed][crashing] https://bugs.llvm.org/show_bug.cgi?id=48358 Reviewed By: martong, Szelethus Differential Revision: https://reviews.llvm.org/D93222
2021-02-22 11:11:57 +01:00
//===- MacroExpansionContext.cpp - Macro expansion information --*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#include "clang/Analysis/MacroExpansionContext.h"
#include "llvm/Support/Debug.h"
#define DEBUG_TYPE "macro-expansion-context"
static void dumpTokenInto(const clang::Preprocessor &PP, clang::raw_ostream &OS,
clang::Token Tok);
namespace clang {
namespace detail {
class MacroExpansionRangeRecorder : public PPCallbacks {
const Preprocessor &PP;
SourceManager &SM;
MacroExpansionContext::ExpansionRangeMap &ExpansionRanges;
public:
explicit MacroExpansionRangeRecorder(
const Preprocessor &PP, SourceManager &SM,
MacroExpansionContext::ExpansionRangeMap &ExpansionRanges)
: PP(PP), SM(SM), ExpansionRanges(ExpansionRanges) {}
void MacroExpands(const Token &MacroName, const MacroDefinition &MD,
SourceRange Range, const MacroArgs *Args) override {
// Ignore annotation tokens like: _Pragma("pack(push, 1)")
if (MacroName.getIdentifierInfo()->getName() == "_Pragma")
return;
SourceLocation MacroNameBegin = SM.getExpansionLoc(MacroName.getLocation());
assert(MacroNameBegin == SM.getExpansionLoc(Range.getBegin()));
const SourceLocation ExpansionEnd = [Range, &SM = SM, &MacroName] {
// If the range is empty, use the length of the macro.
if (Range.getBegin() == Range.getEnd())
return SM.getExpansionLoc(
MacroName.getLocation().getLocWithOffset(MacroName.getLength()));
// Include the last character.
return SM.getExpansionLoc(Range.getEnd()).getLocWithOffset(1);
}();
(void)PP;
[analyzer] Introduce MacroExpansionContext to libAnalysis Introduce `MacroExpansionContext` to track what and how macros in a translation unit expand. This is the first element of the patch-stack in this direction. The main goal is to substitute the current macro expansion generator in the `PlistsDiagnostics`, but all the other `DiagnosticsConsumer` could benefit from this. `getExpandedText` and `getOriginalText` are the primary functions of this class. The former can provide you the text that was the result of the macro expansion chain starting from a `SourceLocation`. While the latter will tell you **what text** was in the original source code replaced by the macro expansion chain from that location. Here is an example: void bar(); #define retArg(x) x #define retArgUnclosed retArg(bar() #define BB CC #define applyInt BB(int) #define CC(x) retArgUnclosed void unbalancedMacros() { applyInt ); //^~~~~~~~~~^ is the substituted range // Original text is "applyInt )" // Expanded text is "bar()" } #define expandArgUnclosedCommaExpr(x) (x, bar(), 1 #define f expandArgUnclosedCommaExpr void unbalancedMacros2() { int x = f(f(1)) )); // Look at the parenthesis! // ^~~~~~^ is the substituted range // Original text is "f(f(1))" // Expanded text is "((1,bar(),1,bar(),1" } Might worth investigating how to provide a reusable component, which could be used for example by a standalone tool eg. expanding all macros to their definitions. I borrowed the main idea from the `PrintPreprocessedOutput.cpp` Frontend component, providing a `PPCallbacks` instance hooking the preprocessor events. I'm using that for calculating the source range where tokens will be expanded to. I'm also using the `Preprocessor`'s `OnToken` callback, via the `Preprocessor::setTokenWatcher` to reconstruct the expanded text. Unfortunately, I concatenate the token's string representation without any whitespaces except if the token is an identifier when I emit an extra space to produce valid code for `int var` token sequences. This could be improved later if needed. Patch-stack: 1) D93222 (this one) Introduces the MacroExpansionContext class and unittests 2) D93223 Create MacroExpansionContext member in AnalysisConsumer and pass down to the diagnostics consumers 3) D93224 Use the MacroExpansionContext for macro expansions in plists It replaces the 'old' macro expansion mechanism. 4) D94673 API for CTU macro expansions You should be able to get a `MacroExpansionContext` for each imported TU. Right now it will just return `llvm::None` as this is not implemented yet. 5) FIXME: Implement macro expansion tracking for imported TUs as well. It would also relieve us from bugs like: - [fixed] D86135 - [confirmed] The `__VA_ARGS__` and other macro nitty-gritty, such as how to stringify macro parameters, where to put or swallow commas, etc. are not handled correctly. - [confirmed] Unbalanced parenthesis are not well handled - resulting in incorrect expansions or even crashes. - [confirmed][crashing] https://bugs.llvm.org/show_bug.cgi?id=48358 Reviewed By: martong, Szelethus Differential Revision: https://reviews.llvm.org/D93222
2021-02-22 11:11:57 +01:00
LLVM_DEBUG(llvm::dbgs() << "MacroExpands event: '";
dumpTokenInto(PP, llvm::dbgs(), MacroName);
llvm::dbgs()
<< "' with length " << MacroName.getLength() << " at ";
MacroNameBegin.print(llvm::dbgs(), SM);
llvm::dbgs() << ", expansion end at ";
ExpansionEnd.print(llvm::dbgs(), SM); llvm::dbgs() << '\n';);
// If the expansion range is empty, use the identifier of the macro as a
// range.
MacroExpansionContext::ExpansionRangeMap::iterator It;
bool Inserted;
std::tie(It, Inserted) =
ExpansionRanges.try_emplace(MacroNameBegin, ExpansionEnd);
if (Inserted) {
LLVM_DEBUG(llvm::dbgs() << "maps ";
It->getFirst().print(llvm::dbgs(), SM); llvm::dbgs() << " to ";
It->getSecond().print(llvm::dbgs(), SM);
llvm::dbgs() << '\n';);
} else {
if (SM.isBeforeInTranslationUnit(It->getSecond(), ExpansionEnd)) {
It->getSecond() = ExpansionEnd;
LLVM_DEBUG(
llvm::dbgs() << "remaps "; It->getFirst().print(llvm::dbgs(), SM);
llvm::dbgs() << " to "; It->getSecond().print(llvm::dbgs(), SM);
llvm::dbgs() << '\n';);
}
}
}
};
} // namespace detail
} // namespace clang
using namespace clang;
MacroExpansionContext::MacroExpansionContext(const LangOptions &LangOpts)
: LangOpts(LangOpts) {}
void MacroExpansionContext::registerForPreprocessor(Preprocessor &NewPP) {
PP = &NewPP;
SM = &NewPP.getSourceManager();
// Make sure that the Preprocessor does not outlive the MacroExpansionContext.
PP->addPPCallbacks(std::make_unique<detail::MacroExpansionRangeRecorder>(
*PP, *SM, ExpansionRanges));
// Same applies here.
PP->setTokenWatcher([this](const Token &Tok) { onTokenLexed(Tok); });
}
Optional<StringRef>
MacroExpansionContext::getExpandedText(SourceLocation MacroExpansionLoc) const {
if (MacroExpansionLoc.isMacroID())
return llvm::None;
// If there was no macro expansion at that location, return None.
if (ExpansionRanges.find_as(MacroExpansionLoc) == ExpansionRanges.end())
return llvm::None;
// There was macro expansion, but resulted in no tokens, return empty string.
const auto It = ExpandedTokens.find_as(MacroExpansionLoc);
if (It == ExpandedTokens.end())
return StringRef{""};
// Otherwise we have the actual token sequence as string.
return It->getSecond().str();
[analyzer] Introduce MacroExpansionContext to libAnalysis Introduce `MacroExpansionContext` to track what and how macros in a translation unit expand. This is the first element of the patch-stack in this direction. The main goal is to substitute the current macro expansion generator in the `PlistsDiagnostics`, but all the other `DiagnosticsConsumer` could benefit from this. `getExpandedText` and `getOriginalText` are the primary functions of this class. The former can provide you the text that was the result of the macro expansion chain starting from a `SourceLocation`. While the latter will tell you **what text** was in the original source code replaced by the macro expansion chain from that location. Here is an example: void bar(); #define retArg(x) x #define retArgUnclosed retArg(bar() #define BB CC #define applyInt BB(int) #define CC(x) retArgUnclosed void unbalancedMacros() { applyInt ); //^~~~~~~~~~^ is the substituted range // Original text is "applyInt )" // Expanded text is "bar()" } #define expandArgUnclosedCommaExpr(x) (x, bar(), 1 #define f expandArgUnclosedCommaExpr void unbalancedMacros2() { int x = f(f(1)) )); // Look at the parenthesis! // ^~~~~~^ is the substituted range // Original text is "f(f(1))" // Expanded text is "((1,bar(),1,bar(),1" } Might worth investigating how to provide a reusable component, which could be used for example by a standalone tool eg. expanding all macros to their definitions. I borrowed the main idea from the `PrintPreprocessedOutput.cpp` Frontend component, providing a `PPCallbacks` instance hooking the preprocessor events. I'm using that for calculating the source range where tokens will be expanded to. I'm also using the `Preprocessor`'s `OnToken` callback, via the `Preprocessor::setTokenWatcher` to reconstruct the expanded text. Unfortunately, I concatenate the token's string representation without any whitespaces except if the token is an identifier when I emit an extra space to produce valid code for `int var` token sequences. This could be improved later if needed. Patch-stack: 1) D93222 (this one) Introduces the MacroExpansionContext class and unittests 2) D93223 Create MacroExpansionContext member in AnalysisConsumer and pass down to the diagnostics consumers 3) D93224 Use the MacroExpansionContext for macro expansions in plists It replaces the 'old' macro expansion mechanism. 4) D94673 API for CTU macro expansions You should be able to get a `MacroExpansionContext` for each imported TU. Right now it will just return `llvm::None` as this is not implemented yet. 5) FIXME: Implement macro expansion tracking for imported TUs as well. It would also relieve us from bugs like: - [fixed] D86135 - [confirmed] The `__VA_ARGS__` and other macro nitty-gritty, such as how to stringify macro parameters, where to put or swallow commas, etc. are not handled correctly. - [confirmed] Unbalanced parenthesis are not well handled - resulting in incorrect expansions or even crashes. - [confirmed][crashing] https://bugs.llvm.org/show_bug.cgi?id=48358 Reviewed By: martong, Szelethus Differential Revision: https://reviews.llvm.org/D93222
2021-02-22 11:11:57 +01:00
}
Optional<StringRef>
MacroExpansionContext::getOriginalText(SourceLocation MacroExpansionLoc) const {
if (MacroExpansionLoc.isMacroID())
return llvm::None;
const auto It = ExpansionRanges.find_as(MacroExpansionLoc);
if (It == ExpansionRanges.end())
return llvm::None;
assert(It->getFirst() != It->getSecond() &&
"Every macro expansion must cover a non-empty range.");
return Lexer::getSourceText(
CharSourceRange::getCharRange(It->getFirst(), It->getSecond()), *SM,
LangOpts);
}
void MacroExpansionContext::dumpExpansionRanges() const {
dumpExpansionRangesToStream(llvm::dbgs());
}
void MacroExpansionContext::dumpExpandedTexts() const {
dumpExpandedTextsToStream(llvm::dbgs());
}
void MacroExpansionContext::dumpExpansionRangesToStream(raw_ostream &OS) const {
std::vector<std::pair<SourceLocation, SourceLocation>> LocalExpansionRanges;
LocalExpansionRanges.reserve(ExpansionRanges.size());
for (const auto &Record : ExpansionRanges)
LocalExpansionRanges.emplace_back(
std::make_pair(Record.getFirst(), Record.getSecond()));
llvm::sort(LocalExpansionRanges);
OS << "\n=============== ExpansionRanges ===============\n";
for (const auto &Record : LocalExpansionRanges) {
OS << "> ";
Record.first.print(OS, *SM);
OS << ", ";
Record.second.print(OS, *SM);
OS << '\n';
}
}
void MacroExpansionContext::dumpExpandedTextsToStream(raw_ostream &OS) const {
std::vector<std::pair<SourceLocation, MacroExpansionText>>
LocalExpandedTokens;
LocalExpandedTokens.reserve(ExpandedTokens.size());
for (const auto &Record : ExpandedTokens)
LocalExpandedTokens.emplace_back(
std::make_pair(Record.getFirst(), Record.getSecond()));
llvm::sort(LocalExpandedTokens);
OS << "\n=============== ExpandedTokens ===============\n";
for (const auto &Record : LocalExpandedTokens) {
OS << "> ";
Record.first.print(OS, *SM);
OS << " -> '" << Record.second << "'\n";
}
}
static void dumpTokenInto(const Preprocessor &PP, raw_ostream &OS, Token Tok) {
assert(Tok.isNot(tok::raw_identifier));
// Ignore annotation tokens like: _Pragma("pack(push, 1)")
if (Tok.isAnnotation())
return;
if (IdentifierInfo *II = Tok.getIdentifierInfo()) {
// FIXME: For now, we don't respect whitespaces between macro expanded
// tokens. We just emit a space after every identifier to produce a valid
// code for `int a ;` like expansions.
// ^-^-- Space after the 'int' and 'a' identifiers.
OS << II->getName() << ' ';
} else if (Tok.isLiteral() && !Tok.needsCleaning() && Tok.getLiteralData()) {
OS << StringRef(Tok.getLiteralData(), Tok.getLength());
} else {
char Tmp[256];
if (Tok.getLength() < sizeof(Tmp)) {
const char *TokPtr = Tmp;
// FIXME: Might use a different overload for cleaner callsite.
unsigned Len = PP.getSpelling(Tok, TokPtr);
OS.write(TokPtr, Len);
} else {
OS << "<too long token>";
}
}
}
void MacroExpansionContext::onTokenLexed(const Token &Tok) {
SourceLocation SLoc = Tok.getLocation();
if (SLoc.isFileID())
return;
LLVM_DEBUG(llvm::dbgs() << "lexed macro expansion token '";
dumpTokenInto(*PP, llvm::dbgs(), Tok); llvm::dbgs() << "' at ";
SLoc.print(llvm::dbgs(), *SM); llvm::dbgs() << '\n';);
// Remove spelling location.
SourceLocation CurrExpansionLoc = SM->getExpansionLoc(SLoc);
MacroExpansionText TokenAsString;
llvm::raw_svector_ostream OS(TokenAsString);
// FIXME: Prepend newlines and space to produce the exact same output as the
// preprocessor would for this token.
dumpTokenInto(*PP, OS, Tok);
ExpansionMap::iterator It;
bool Inserted;
std::tie(It, Inserted) =
ExpandedTokens.try_emplace(CurrExpansionLoc, std::move(TokenAsString));
if (!Inserted)
It->getSecond().append(TokenAsString);
}