We have had numerous situations where the negatively stat cached paths
are invalidated during the build, and such invalidations lead to build
errors.
This PR adds an API to diagnose such cases.
`DependencyScanningFilesystemSharedCache::diagnoseNegativeStatCachedPaths`
allows users of the cache to ask the cache to examine all negatively
stat cached paths, and re-stat the paths using the passed-in file
system. If any re-stat succeeds, the API emits diagnostics.
rdar://149147920
This PR reland https://github.com/llvm/llvm-project/pull/135808, fixed
some missed changes in LLDB.
I found this issue when I working on
https://github.com/llvm/llvm-project/pull/107168.
Currently we have many similiar data structures like:
- std::pair<IdentifierInfo *, SourceLocation>.
- Element type of ModuleIdPath.
- IdentifierLocPair.
- IdentifierLoc.
This PR unify these data structures to IdentifierLoc, moved
IdentifierLoc definition to SourceLocation.h, and deleted other similer
data structures.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
Reverts llvm/llvm-project#135808
Example from the LLDB macOS CI:
https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/as-lldb-cmake/24084/execution/node/54/log/?consoleFull
```
/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/source/Plugins/ExpressionParser/Clang/ClangModulesDeclVendor.cpp:360:49: error: no viable conversion from 'std::pair<clang::IdentifierInfo *, clang::SourceLocation>' to 'clang::ModuleIdPath' (aka 'ArrayRef<IdentifierLoc>')
clang::Module *top_level_module = DoGetModule(clang_path.front(), false);
^~~~~~~~~~~~~~~~~~
/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:41:40: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'std::pair<clang::IdentifierInfo *, clang::SourceLocation>' to 'const llvm::ArrayRef<clang::IdentifierLoc> &' for 1st argument
class LLVM_GSL_POINTER [[nodiscard]] ArrayRef {
^
/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:41:40: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'std::pair<clang::IdentifierInfo *, clang::SourceLocation>' to 'llvm::ArrayRef<clang::IdentifierLoc> &&' for 1st argument
/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:70:18: note: candidate constructor not viable: no known conversion from 'std::pair<clang::IdentifierInfo *, clang::SourceLocation>' to 'std::nullopt_t' for 1st argument
/*implicit*/ ArrayRef(std::nullopt_t) {}
```
I found this issue when I working on
https://github.com/llvm/llvm-project/pull/107168.
Currently we have many similiar data structures like:
- `std::pair<IdentifierInfo *, SourceLocation>`.
- Element type of `ModuleIdPath`.
- `IdentifierLocPair`.
- `IdentifierLoc`.
This PR unify these data structures to `IdentifierLoc`, moved
`IdentifierLoc` definition to SourceLocation.h, and deleted other
similer data structures.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
Pass a reference to `StableDirs` when creating ModuleDepCollector. This
avoids needing to create one from the same ScanInstance for each call to
`handleTopLevelModule` & reduces the amount of potential downstream
changes needed for handling StableDirs.
When a module is being scanned, it can depend on modules that have
already been built from a pch dependency. When this happens, the pcm
files are reused for the module dependencies. When this is the case,
check if input files recorded from the PCMs come from the provided
stable directories transitively since the scanner will not have access
to the full set of file dependencies from prebuilt modules.
This patch removes the old range constructors of SmallSet and
StringSet that do not take the llvm::from_range tag. Since there are
so few uses, this patch directly removes them without going through
the deprecation process.
This type can be exposed from C APIs, where instantiations of this type
are not expected to mutate after creation. To support this, mark the
lazy computation of build arguments mutable, as that is not intended to
otherwise mutate the state of these objects.
This was reviewed separately by @jansvoboda11
This is a speculative fix for link failure found by the RHEL 8 bot: https://lab.llvm.org/buildbot/#/builders/204/builds/3862.
FAILED: lib/libclangDependencyScanning.so.21.0git
/InProcessModuleCache.cpp.o: In function `(anonymous namespace)::ReaderWriterLock::tryLock()':
InProcessModuleCache.cpp:(.text._ZN12_GLOBAL__N_116ReaderWriterLock7tryLockEv+0x1d): undefined reference to `pthread_rwlock_trywrlock'
This patch fixes:
clang/lib/Tooling/DependencyScanning/InProcessModuleCache.cpp:38:5:
error: 'shared_lock' may not intend to support class template
argument deduction [-Werror,-Wctad-maybe-unsupported]
clang/lib/Tooling/DependencyScanning/InProcessModuleCache.cpp:68:7:
error: 'lock_guard' may not intend to support class template
argument deduction [-Werror,-Wctad-maybe-unsupported]
The dependency scanner uses implicitly-built Clang modules under the
hood. This system was originally designed to handle multiple concurrent
processes working on the same module cache, and mutual exclusion was
implemented using file locks. The scanner, however, runs within single
process, making file locks unnecessary. This patch virtualizes the
interface for module cache locking and provides an implementation based
on `std::shared_mutex`. This reduces `clang-scan-deps` runtime by ~17%
on my benchmark.
Note that even when multiple processes run a scan on the same module
cache (and therefore don't coordinate efficiently), this should still be
correct due to the strict context hash, the write-through
`InMemoryModuleCache` and the logic for rebuilding out-of-date or
incompatible modules.
That patch tracks whether all the file & module dependencies of a module
resolve to a stable location. This information will later be queried by
build systems for determining where to store the accompanying pcms.
In d64eccf we split the object file reader from the writer and removed
the dependency on clangCodeGen from the dependency scanner. However, we
were still initializing all the llvm targets, which was needed for
writing object file pcms but not reading them. This caused us to link
more llvm target code than necessary. Shrinks clangd by nearly 50% for
me.
rdar://144790713
Shared state between dependency scanning workers is managed by the
dependency scanning service.
Right now, the members are individually threaded through the worker,
action, and collector. This makes any change to the service and its
members a very laborious process. Moreover, this situation causes
frequent merge conflicts in our downstream repo where the service does
have some extra members that need to be passed around.
To ease the maintenance burden, this PR starts passing a reference to
the entire service.
This PR explicitly sets `DebugCompilationDir` to the system's root
directory if it is safe to ignore the current working directory.
This fixes a problem where a PCM file's embedded debug information can
lead to compilation failure. The compiler may have decided it is indeed
safe to ignore the current working directory. In this case, the PCM
file's content is functionally correct regardless of the current working
directory because no inputs use relative paths (see
https://github.com/llvm/llvm-project/pull/124786). However, a PCM may
contain debug info. If debug info is requested, the compiler uses the
current working directory value to set `DW_AT_comp_dir`. This may lead
to the following situation:
1. Two different compilations need the same PCM file.
2. The PCM file is compiled assuming a working directory, which is
embedded in the debug info, but otherwise has no effect.
3. The second compilation assumes a different working directory, and
expects an identically-sized pcm file. However, it cannot find such a
PCM, because the existing PCM file has been compiled assuming a
different `DW_AT_comp_dir `, which is embedded in the debug info.
This PR resets the `DebugCompilationDir` if it is functionally safe to
ignore the working directory so the above situation is avoided, since
all debug information will share the same working directory.
rdar://145249881
When using the clang dependency scanner with an arbitrary
DiagnosticConsumer, it is important that we always call finish().
Previously, if there was an error preventing us from reaching the
scanning action, or if the command line contained no scannable actions
we would fail to finish(), which would break some consumers (e.g.
serialized diag consumer).
When computing the context hash, `clang` always includes the compiler's
working directory. This can lead to situations when the only difference
between two compilations is the working directory, different module
variants are generated. These variants are redundant. This PR implements
an optimization that ignores the working directory when computing the
context hash when safe.
Specifically, `clang` checks if it is safe to ignore the working
directory in `isSafeToIgnoreCWD`. The check involves going through
compile command options to see if any paths specified are relative. The
definition of relative path used here is that the input path is not
empty, and `llvm::sys::path::is_absolute` is false. If all the paths
examined are not relative, `clang` considers it safe to ignore the
current working directory and does not consider the working directory
when computing the context hash.
Update Dependency scanner so it can scan the dependency of a TU with
a provided buffer rather than relying on the on disk file system to
provide the input file.
In the discussion around #116792, @rjmccall mentioned that ARCMigrate
has been obsoleted and that we could go ahead and remove it from Clang,
so this patch does just that.
This adds new symbols to the generated mapping and removes special
mappings for missing symbols introduced in #113612, as these symbols are
now included in the generated mapping.
Starting with 41e3919ded78d8870f7c95e9181c7f7e29aa3cc4 DiagnosticsEngine
creation might perform IO. It was implicitly defaulting to
getRealFileSystem. This patch makes it explicit by pushing the decision
making to callers.
It uses ambient VFS if one is available, and keeps using
`getRealFileSystem` if there aren't any VFS.
This PR builds on top of #113984 and attempts to avoid allocating input
file paths eagerly. Instead, the `InputFileInfo` type used by
`ASTReader` now only holds `StringRef`s that point into the PCM file
buffer, and the full input file paths get resolved on demand.
The dependency scanner makes use of this in a bit of a roundabout way:
`ModuleDeps` now only holds (an owning copy of) the short unresolved
input file paths, which get resolved lazily. This can be a big win, I'm
seeing up to a 5% speedup.
The `FileManager` sharing between module-building `CompilerInstance`s
was disabled a while ago due to `FileEntry::getName()` being unreliable.
Now that we use `FileEntryRef::getNameAsRequested()` in places where it
matters, re-enabling `FileManager` is sound and improves performance of
`clang-scan-deps` by ~6.2%.
With inferred modules, the dependency scanner takes care to replace the
fake "__inferred_module.map" path with the file that allowed the module
to be inferred. However, this only worked when such a module was
imported directly in the TU. Whenever such module got loaded
transitively, the scanner would fail to perform the replacement. This is
caused by the fact that PCM files are lossy and drop this information.
This patch makes sure that PCMs include this file for each submodule (in
the `SUBMODULE_DEFINITION` record), fixes one existing test with an
incorrect assertion, and does a little drive-by refactoring of
`ModuleMap`.
Some `FileManager` APIs still return `{File,Directory}Entry` instead of
the preferred `{File,Directory}EntryRef`. These are documented to be
deprecated, but don't have the attribute that warns on their usage. This
PR marks them as such with `LLVM_DEPRECATED()` and replaces their usage
with the recommended counterparts. NFCI.
This patch adds an IsText parameter to the following functions
openFileForRead, getBufferForFile, getBufferForFileImpl and determines
whether a file is text by querying the file tag on z/OS. The default is
set to OF_Text instead of OF_None, this change in value does not affect
any other platforms other than z/OS.
This allows the clang driver to know which tool is meant to be executed,
which allows the clang driver to load the right clang config files, and
allows clang to find colocated sysroots.
This makes sure that doing `clang-scan-deps -- <tool> ...` looks up
things in the same way as if one just would execute `<tool> ...`, when
`<tool>` isn't an absolute or relative path.
Clang's `-cc1 -print-stats` shows lots of useful internal data including
basic `FileManager` stats. Since this layer caches some results, it is
unclear how that information translates to actual filesystem accesses.
This PR uses `llvm::vfs::TracingFileSystem` to provide that missing
information.
Similar mechanism is implemented for `clang-scan-deps`'s verbose mode
(`-v`). IO contention proved to be a real bottleneck a couple of times
already and this new feature should make those easier to detect in the
future. The tracing VFS is inserted below the caching FS and above the
real FS.