1101 Commits

Author SHA1 Message Date
Teresa Johnson
fdb050a502
Revert "[MemProf] Use radix tree for alloc contexts in bitcode summaries" (#117395)
Reverts llvm/llvm-project#117066

This is causing some build bot failures that need investigation.
2024-11-22 14:57:58 -08:00
Teresa Johnson
ccb4702038
[MemProf] Use radix tree for alloc contexts in bitcode summaries (#117066)
Leverage the support added to represent allocation contexts in a more
compact way via a radix tree in the indexed profile to similarly reduce
sizes of the bitcode summaries.

For a large target, this reduced the size of the per-module summaries by
about 18% and in the distributed combined index files by 28%.
2024-11-22 14:49:55 -08:00
Kazu Hirata
ad2bdd8fab
[memprof] Remove MemProf format Version 1 (#117357)
This patch removes MemProf format Version 1 now that Version 2 and 3
are working well.
2024-11-22 11:53:31 -08:00
Teresa Johnson
e14827f082
[MemProf] Templatize CallStackRadixTreeBuilder (NFC) (#117014)
Prepare for usage in the bitcode reader/writer where we already have a
LinearFrameId:
- templatize input frame id type in CallStackRadixTreeBuilder
- templatize input frame id type in computeFrameHistogram
- make the map from FrameId to LinearFrameId optional

We plan to use the same radix format in the ThinLTO summary records,
where we already have a LinearFrameId.
2024-11-20 10:08:58 -08:00
Kazu Hirata
4f1b20f023
[ProfileData] Remove unused includes (NFC) (#116751)
Identified with misc-include-cleaner.
2024-11-19 19:42:20 -08:00
Kazu Hirata
f97c610d1f
[memprof] Add MemProfReader::takeMemProfData (#116769)
This patch adds MemProfReader::takeMemProfData, a function to return
the complete MemProf profile from the reader.  We can directly pass
its return value to InstrProfWriter::addMemProfData without having to
deal with the indivual components of the MemProf profile.  The new
function is named "take", but it doesn't do std::move yet because of
type differences (DenseMap v.s. MapVector).

The end state I'm trying to get to is roughly as follows:

- MemProfReader accepts IndexedMemProfData as a parameter as opposed
  to the three individual components (frames, call stacks, and
  records).

- MemProfReader keeps IndexedMemProfData as a class member without
  decomposing it into its individual components.

- MemProfReader returns IndexedMemProfData like:

  IndexedMemProfData takeMemProfData() {
    return std::move(MemProfData);
  }
2024-11-19 19:33:26 -08:00
Kazu Hirata
6bf8f08989
[memprof] Add InstrProfWriter::addMemProfData (#116528)
This patch adds InstrProfWriter::addMemProfData, which adds the
complete MemProf profile (frames, call stacks, and records) to the
writer context.

Without this function, functions like loadInput in llvm-profdata.cpp
and InstrProfWriter::mergeRecordsFromWriter must add one item (frame,
call stack, or record) at a time.  The new function std::moves the
entire MemProf profile to the writer context if the destination is
empty, which is the common use case.  Otherwise, we fall back to
adding one item at a time behind the scene.

Here are a couple of reasons why we should add this function:

- We've had a bug where we forgot to add one of the three data
  structures (frames, call stacks, and records) to the writer context,
  resulting in a nearly empty indexed profile.  We should always
  package the three data structures together, especially on API
  boundaries.

- We expose a little too much of the MemProf detail to
  InstrProfWriter.  I'd like to gradually transform
  InstrProfReader/Writer to entities managing buffers (sequences of
  bytes), with actual serialization/deserialization left to external
  classes.  We already do some of this in InstrProfReader, where
  InstrProfReader "contracts out" to IndexedMemProfReader to handle
  MemProf details.

I am not changing loadInput or InstrProfWriter::mergeRecordsFromWriter
for now because MemProfReader uses DenseMap for frames and call
stacks, whereas MemProfData uses MapVector.  I'll resolve these
mismatches in subsequent patches.
2024-11-18 08:56:25 -08:00
Kazu Hirata
0d38f64e7d
[memprof] Remove MemProf format Version 0 (#116442)
This patch removes MemProf format Version 0 now that version 2 and 3
seem to be working well.

I'm not touching version 1 for now because some tests still rely on
version 1.

Note that Version 0 is identical to Version 1 except that the MemProf
section of the indexed format has a MemProf version field.
2024-11-15 15:37:00 -08:00
Kazu Hirata
57ed628fb3
[memprof] Speed up caller-callee pair extraction (Part 2) (#116441)
This patch further speeds up the extraction of caller-callee pairs
from the profile.

Recall that we reconstruct a call stack by traversing the radix tree
from one of its leaf nodes toward a root.  The implication is that
when we decode many different call stacks, we end up visiting nodes
near the root(s) repeatedly.  That in turn adds many duplicates to our
data structure:

  DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> Calls;

only to be deduplicated later with sort+unique for each vector.

This patch makes the extraction process more efficient by keeping
track of indices of the radix tree array we've visited so far and
terminating traversal as soon as we encounter an element previously
visited.

Note that even with this improvement, we still add at least one
caller-callee pair to the data structure above for each call stack
because we do need to add a caller-callee pair for the leaf node with
the callee GUID being 0.

Without this patch, it takes 4 seconds to extract caller-callee pairs
from a large MemProf profile.  This patch shortenes that down to
900ms.
2024-11-15 15:33:23 -08:00
Kazu Hirata
ec353b7418
[memprof] Use llvm::function_ref instead of std::function (#116306)
We've seen bugs where we lost track of error states stored in the
functor because we passed the functor by value (that is,
std::function) as opposed to reference (llvm::function_ref).

This patch fixes a couple of places we pass functors by value.

While we are at it, this patch adds curly braces around a "for" loop
spanning multiple lines.
2024-11-15 13:03:24 -08:00
Kazu Hirata
59da1afd2a
[memprof] Speed up caller-callee pair extraction (#116184)
We know that the MemProf profile has a lot of duplicate call stacks.
Extracting caller-callee pairs from a call stack we've seen before is
a wasteful effort.

This patch makes the extraction more efficient by first coming up with
a work list of linear call stack IDs -- the set of starting positions
in the radix tree array -- and then extract caller-callee pairs from
each call stack in the work list.

We implement the work list as a bit vector because we expect the work
list to be dense in the range [0, RadixTreeSize).  Also, we want the
set insertion to be cheap.

Without this patch, it takes 25 seconds to extract caller-callee pairs
from a large MemProf profile.  This patch shortenes that down to 4
seconds.
2024-11-14 15:54:55 -08:00
Kazu Hirata
9a730d878e
[memprof] Add IndexedMemProfReader::getMemProfCallerCalleePairs (#115807)
Undrifting the MemProf profile requires two sets of information:

- caller-callee pairs from the profile
- callee-callee pairs from the IR

This patch adds a function to do the former.  The latter has been
addressed by extractCallsFromIR.

Unfortunately, the current MemProf format does not directly give us
the caller-callee pairs from the profile.  "struct Frame" just tells
us where the call site is -- Caller GUID and line/column numbers; it
doesn't tell us what function a given Frame is calling.  To extract
caller-callee pairs, we need to scan each call stack, look at two
adjacent Frames, and extract a caller-callee pair.

Conceptually, we would extract caller-callee pairs with:

  for each MemProfRecord in the profile:
    for each call stack in AllocSites:
      extract caller-callee pairs from adjacent pairs of Frames

However, this is highly inefficient.  Obtaining MemProfRecord involves
looking up the OnDiskHashTable, allocating several vectors on the
heap, and populating fields that are irrelevant to us, such as MIB and
CallSites.

This patch adds an efficient way of doing the above.  Specifically, we

- go though all IndexedMemProfRecords,
- look at each linear call stack ID
- extract caller-callee pairs from each call stack

The extraction is done by a new class CallerCalleePairExtractor,
modified from LinearCallStackIdConverter, which reconstructs a call
stack from the radix tree array.  For our purposes, we skip the
reconstruction and immediately populates the data structure for
caller-callee pairs.

The resulting caller-callee-pairs is of the type:

  DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> CallerCalleePairs;

which can be passed directly to longestCommonSequence just like the
result of extractCallsFromIR.

Further performance optimizations are possible for the new functions
in this patch.  I'll address those in follow-up patches.
2024-11-13 23:40:12 -08:00
Teresa Johnson
594e11ce42
[MemProf] Avoid incorrect ICP symtab canonicalization (#115419)
ICP builds a symtab from the symbols in the module allowing mapping from
the VP metadata GUIDs to the Function. MemProf uses this same symtab
handling for its ICP during cloning. When symbols are added to the
symtab, the handling adds both a GUID computed from the function name,
or from the attached PGOFuncName metadata for locals, as well as a GUID
computed from the "canonicalized" name, which strips all "." suffixes
other than ".__uniq". This was originally meant to remove the ".llvm.*"
suffix added to promoted locals (done earlier in the ThinLTO backend).
In theory, it should no longer be needed as locals should have
PGOFuncName metadata.

However, this was causing a linker unsat, in code that used coroutines.
For an original coroutine function, there were several additional
functions created that had the same name, but different "." suffixes.
Therefore the canonical name for these additional functions had the same
GUID as that of the original function, leading to extra entries in the
symtab, and to selecting the wrong function for promotion. For regular
ICP this can happen, but is just a performance issue. However, for
memprof the promoted direct call calls a memprof clone, and because we
called the wrong function, in this case it didn't have a memprof clone
and we got a linker unsat.

We may be able to remove the canonical name handling for ICP in general,
but for now disable it for MemProf. At worst this could lead to not
finding a GUID in the symtab and not performing an ICP, so should be
conservatively correct.
2024-11-07 21:00:42 -08:00
Teresa Johnson
475e736bb5
[MemProf] Include <ctime> to avoid MSVC failure (#114246)
My change in bb3915149a7c9b1660db9caebfc96343352e8454 added a call to
std::time which worked generally as there must be some transitive
include of <ctime>. However, I saw one MSVC bot failure:

InstrProfWriter.cpp(202): error C2039: 'time': is not a member of 'std'

from https://lab.llvm.org/buildbot/#/builders/63/builds/2325.

Presumably explictly including <ctime> should fix this.
2024-10-30 08:28:22 -07:00
Teresa Johnson
bb3915149a
[MemProf] Support for random hotness when writing profile (#113998)
Add support for generating random hotness in the memprof profile writer,
to be used for testing. The random seed is printed to stderr, and an
additional option enables providing a specific seed in order to
reproduce a particular random profile.
2024-10-29 22:10:33 -07:00
Yuta Saito
bf8f5cc9b5
Reland: [llvm-cov][WebAssembly] Read __llvm_prf_names from data segments (#112569)
On WebAssembly, most coverage metadata contents read by llvm-cov (like
`__llvm_covmap` and `__llvm_covfun`) are stored in custom sections
because they are not referenced at runtime. However, `__llvm_prf_names`
is referenced at runtime by the profile runtime library and is read by
llvm-cov post-processing tools, so it needs to be stored in a data
segment, which is allocatable at runtime and accessible by tools as long
as "name" section is present in the binary.

This patch changes the way llvm-cov reads `__llvm_prf_names` on
WebAssembly. Instead of looking for a section, it looks for a data
segment with the same name.

This reverts commit 157f10ddf2d851125a85a71e530dc9d50cb032a2 and fixes
PE/COFF `.lprfn$A` section handling.
2024-10-24 12:52:50 +09:00
NAKAMURA Takumi
4a011ac84f
[Coverage] Introduce "partial fold" on BranchRegion (#112694)
Currently both True/False counts were folded. It lost the information,
"It is True or False before folding." It prevented recalling branch
counts in merging template instantiations.

In `llvm-cov`, a folded branch is shown as:

- `[True: n, Folded]`
- `[Folded, False n]`

In the case If `n` is zero, a branch is reported as "uncovered". This is
distinguished from "folded" branch. When folded branches are merged,
`Folded` may be dissolved.

In the coverage map, either `Counter` is `Zero`. Currently both were
`Zero`.

Since "partial fold" has been introduced, either case in `switch` is
omitted as `Folded`.

Each `case:` in `switch` is reported as `[True: n, Folded]`, since
`False` count doesn't show meaningful value.

When `switch` doesn't have `default:`, `switch (Cond)` is reported as
`[Folded, False: n]`, since `True` count was just the sum of `case`(s).
`switch` with `default` can be considered as "the statement that doesn't
have any `False`(s)".
2024-10-20 12:30:35 +09:00
Yuta Saito
157f10ddf2
Revert "[llvm-cov][WebAssembly] Read __llvm_prf_names from data segments" (#112520)
This reverts commit efc9dd4118a7ada7d8c898582f16db64827f7ce0 in order to
fix Windows test failure:
https://github.com/llvm/llvm-project/pull/111332#issuecomment-2416462512
2024-10-16 21:42:54 +09:00
Yuta Saito
d4efc3e097
[Coverage][WebAssembly] Add initial support for WebAssembly/WASI (#111332)
Currently, WebAssembly/WASI target does not provide direct support for
code coverage.
This patch set fixes several issues to unlock the feature. The main
changes are:

1. Port `compiler-rt/lib/profile` to WebAssembly/WASI.
2. Adjust profile metadata sections for Wasm object file format.
- [CodeGen] Emit `__llvm_covmap` and `__llvm_covfun` as custom sections
instead of data segments.
    - [lld] Align the interval space of custom sections at link time.
- [llvm-cov] Copy misaligned custom section data if the start address is
not aligned.
    - [llvm-cov] Read `__llvm_prf_names` from data segments
3. [clang] Link with profile runtime libraries if requested

See each commit message for more details and rationale.
This is part of the effort to add code coverage support in Wasm target
of Swift toolchain.
2024-10-15 02:41:43 +09:00
Kazu Hirata
02b9c97b75
[memprof] Simplify code with MapVector::operator[] (NFC) (#111335)
Note that the following are all equivalent to each other:

  Map.insert({Key, Value()}).first->second
  Map.try_emplace(Key).first->second
  Map[Key]
2024-10-07 09:00:05 -07:00
Kazu Hirata
abaa8247e8
[memprof] Avoid repeated hash lookups (NFC) (#110789) 2024-10-02 06:53:11 -07:00
Kazu Hirata
0089f39e0f
[ProfileData] Avoid repeated hash lookups (NFC) (#110619) 2024-10-01 00:30:04 -07:00
Kazu Hirata
e4e3ff5adc
[llvm] Use std::optional::value_or (NFC) (#109568) 2024-09-22 01:00:24 -07:00
gulfemsavrun
536bdc99e6
[Coverage] Skip empty profile name section (#108480)
llvm-cov reads __llvm_prf_names section in an object file to find the
profile names, and it instead reads __llvm_covnames section in binary
profile correlation mode when __llvm_prf_names section is omitted. This
patch ensures that it still reads __llvm_covnames section when there is
an empty __llvm_prf_names section.
2024-09-13 16:41:07 -07:00
Mircea Trofin
885ac29910 [nfc][ctx_prof] Change some internal "set" types
- the set used for targets under a callsite is simpler to use if iterators
  are stable (it gets manipulated during updates)
- the set used to fetch the transitive closure of GUIDs under a node can
  be left as a choice to the user.
2024-09-12 10:34:53 -07:00
Zequan Wu
a7c26aaf2e
Revert "[Coverage] Ignore unused functions if the count is 0." (#107901)
Reverts llvm/llvm-project#107661

Breaks llvm-project/llvm/unittests/ProfileData/CoverageMappingTest.cpp
2024-09-09 14:34:13 -04:00
Zequan Wu
6850410562
[Coverage] Ignore unused functions if the count is 0. (#107661)
Relax the condition to ignore the case when count is 0. 

This fixes a bug on
381e9d2386.
This was reported at
https://discourse.llvm.org/t/coverage-from-multiple-test-executables/81024/.
2024-09-09 14:14:21 -04:00
gulfemsavrun
787cd8f0fe
[InstrProf] Add debuginfod correlation support (#106606)
This patch adds debuginfod support into llvm-profdata to
find the assosicated executable by a build id in a raw
profile to correlate a profile with a provided correlation
kind (debug-info or binary).
2024-09-06 13:28:23 -07:00
William Junda Huang
75e9d191f5
[llvm-profdata] Enabled functionality to write split-layout profile (#101795)
Using the flag `-split_layout` in llvm-profdata merge, the output
profile can write profiles with and without inlined function into two
different extbinary sections (and their FuncOffsetTable too). The
section without inlined functions are marked with `SecFlagFlat` and is
skipped by ThinLTO because it provides no useful info.

The split layout feature was already implemented in SampleProfWriter but
previously there is no way to use it from llvm-profdata.
2024-08-28 20:33:54 -04:00
Mircea Trofin
1022323c9b
[ctx_prof] Move the "from json" logic more centrally to reuse it from test. (#106129)
Making the synthesis of a contextual profile file from a JSON descriptor more reusable, for unittest authoring purposes.

The functionality round-trips through the binary format - no reason, currently, to support other ways of loading contextual profiles.
2024-08-27 15:43:05 -07:00
Lei Wang
23144e87d2
[SampleFDO][NFC] Refactoring sample reader to support on-demand read profiles for given functions (#104654)
Currently in extended binary format, sample reader only read the
profiles when the function are in the current module at initialization
time, this extends the support to read the arbitrary profiles for given
input functions in later stage. It's used for
https://github.com/llvm/llvm-project/pull/101053.
2024-08-27 11:56:24 -07:00
Ethan Luis McDonough
fde2d23ee2
[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587) (#102691)
This pull request is a revised version of #76587. This pull request
fixes some build issues that were present in the previous version of
this change.

> This pull request is the first part of an ongoing effort to extends
PGO instrumentation to GPU device code. This PR makes the following
changes:
>
> - Adds blank registration functions to device RTL
> - Gives PGO globals protected visibility when targeting a supported
GPU
> - Handles any addrspace casts for PGO calls
> - Implements PGO global extraction in GPU plugins (currently only
dumps info)
>
> These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-08-22 01:10:54 -05:00
Kazu Hirata
dca820951c
[llvm] Use llvm::any_of (NFC) (#104443) 2024-08-15 17:59:10 -07:00
Mircea Trofin
6b47772a4b
[nfc][ctx_prof] Rename PGOContextualProfile to PGOCtxProfContext (#102209) 2024-08-06 17:41:38 -04:00
Mircea Trofin
cc7308a156
[ctx_prof] Make the profile output analyzable by llvm-bcanalyzer (#99563)
This requires output-ing a "Magic" 4-byte header. We also emit a block info block, to describe our blocks and records. The output of `llvm-bcanalyzer` would look like:

```
<BLOCKINFO_BLOCK/>
<Metadata NumWords=17 BlockCodeSize=2>
  <Version op0=1/>
  <Context NumWords=13 BlockCodeSize=2>
    <GUID op0=2/>
    <Counters op0=1 op1=2 op2=3/>
```

Instead of having `Unknown` for block and record IDs.
2024-07-23 08:59:07 -04:00
Lei Wang
18cdfa72e0
[SampleFDO] Stale profile call-graph matching (#95135)
Profile staleness could be due to function renaming. Given that sample
profile loader relies on exact string matching, a trivial change in the
function signature( such as `int foo()` --> `long foo()` ) can make the
mangled name different, the function profile(including all nested
children profile) becomes unavailable.

This patch introduces stale profile call-graph level matching, targeting
at identifying the trivial function renaming and reusing the old
function profile.

Some noteworthy details:

1. Extend the LCS based CFG level matching to identify new function. 
- Extend to match function and profile have different name instead of
the exact function name matching. This leverages LCS, i.e during the
finding of callsite anchor matching, when two function name are
different, try matching the functions instead of return.
- In LCS, the equal function check is replaced by
`functionMatchesProfile`.
- Only try matching functions that are new functions(neither appears on
each side). This reduces the matching scope as we don't need to match
the originally matched function.
2.  Determine the matching by call-site anchor similarity check.
- A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to
check the renaming for the possible <IRFunc, ProfFunc> pair, use the
LCS(diff) matching to compute the equal set and we define: `Similarity =
|equalSet * 2| / (|A| + |B|)`. The profile name is marked as renamed if
the similarity is above a
threshold(`-func-profile-similarity-threshold`)

3.  Process the matching in top-down function order 
- when a caller's is done matching, the new function names are saved for
later use, using top-down order will maximize the reused results.
- `ProfileNameToFuncMap` is used to save or cache the matching result.
4. Update the original profile at the end using `ProfileNameToFuncMap`.

5. Added a new switch --salvage-unused-profile to control this, default
is false.

Verified on one Meta's internal big service, confirmed 90%+ of the found
renaming pair is good. (There could be incorrect renaming pair if the
num of the anchor is small, but checked that those functions are simple
cold function)
2024-07-17 10:33:00 -07:00
Kazu Hirata
6c8ff4cbb8
[ProfileData] Take ArrayRef<InstrProfValueData> in addValueData (NFC) (#97363)
This patch fixes another place in ProfileData where we have a pointer
to an array of InstrProfValueData and its length separately.

addValueData is a bit unique in that it remaps incoming values in
place before adding them to ValueSites.  AFAICT, no caller of
addValueData uses updated incoming values.  With this patch, we add
value data to ValueSites first and then remaps values there.  This
way, we can take ArrayRef<InstrProfValueData> as a parameter.
2024-07-11 16:38:44 -07:00
Mircea Trofin
afbd7d1e7c
[NFC] Coding style: drop k in kGlobalIdentifierDelimiter (#98230) 2024-07-09 15:44:55 -07:00
Mircea Trofin
ce03155a1b
[NFC] Coding style fixes: SampleProf (#98208)
Also some control flow simplifications.

Notably, this doesn't address `sampleprof_error`. I *think* the style
there tries to match `std::error_category`.

Also left `hash_value` as-is, because it matches what we do in Hashing.h
2024-07-09 14:35:49 -07:00
Mircea Trofin
e291f31f89
[NFC] Coding style fixes in InstrProf.cpp (#98211) 2024-07-09 13:28:35 -07:00
Kazu Hirata
0cfd03ac0d
[ProfileData] Use ArrayRef in PatchItem (NFC) (#97379)
Packaging an array and its size as ArrayRef in PatchItem allows us to
get rid of things like std::size(Header) and HeaderOffsets.size().
2024-07-02 22:58:26 -07:00
Kazu Hirata
b8eaa5bb10
[ProfileData] Remove the old version of getValueProfDataFromInst (#97374)
I've migrated uses of the old version of getValueProfDataFromInst to
the one that returns SmallVector<InstrProfValueData, 4>.  This patch
removes the old version.
2024-07-02 11:46:31 -07:00
Mingming Liu
1518b260ce
[TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (#81442)
Clang's `-fwhole-program-vtables` is required for this optimization to
take place. If `-fwhole-program-vtables` is not enabled, this change is
no-op.
    
* Function-comparison (before):

```
%vtable = load ptr, ptr %obj
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
%cond = icmp eq ptr %func, @callee
br i1 %cond, label bb1, label bb2:

bb1:
   call @callee

bb2:
   call %func
```

* VTable-comparison (after):

```
%vtable = load ptr, ptr %obj
%cond = icmp eq ptr %vtable, @vtable-address-point
br i1 %cond, label bb1, label bb2:

bb1:
   call @callee

bb2:
  %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
  %func = load ptr, ptr %vfn
  call %func
```
    
Key changes:
1. Find out virtual calls and the vtables they come from.
- The ICP relies on type intrinsic `llvm.type.test` to find out virtual
calls and the
compatible vtables, and relies on type metadata to find the address
point for comparison.
2. ICP pass does cost-benefit analysis and compares vtable only when the
number of vtables for a function candidate is within (option specified)
threshold.
3. Sink the function addressing and vtable load instruction to indirect
fallback.
- The sink helper functions are simplified versions of
`InstCombinerImpl::tryToSinkInstruction`. Currently debug intrinsics are
not handled. Ideally `InstCombinerImpl::tryToSinkInstructionDbgValues`
and `InstCombinerImpl::tryToSinkInstructionDbgVariableRecords` could be
moved into Transforms/Utils/Local.cpp (or another util cpp file) to
handle debug intrinsics when moving instructions across basic blocks.
4. Keep value profiles updated
     1) Update vtable value profiles after inline
     2) For either function-based comparison or vtable-based comparison,
          update both vtable and indirect call value profiles.
2024-06-29 23:21:33 -07:00
Ethan Luis McDonough
2c8b912f63
Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587)"
This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.
2024-06-28 12:30:45 -05:00
Ethan Luis McDonough
5fd2af38e4
[PGO][OpenMP] Instrumentation for GPU devices (#76587)
This pull request is the first part of an ongoing effort to extends PGO
instrumentation to GPU device code. This PR makes the following changes:

- Adds blank registration functions to device RTL
- Gives PGO globals protected visibility when targeting a supported GPU
- Handles any addrspace casts for PGO calls
- Implements PGO global extraction in GPU plugins (currently only dumps
info)

These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-06-28 10:42:19 -05:00
Matthew Weingarten
ca4e5a8d6e
[Memprof] Fixes memory leak in MemInfoBlock histogram. (#96834)
MemInfoBlocks (MIB) with empty callstacks are erased prematurely from
the CallStackProfileData. This patch frees allocated histogram buffers
when the MIB is associated with an empty callstack.
2024-06-26 17:11:21 -07:00
Kazu Hirata
22b36bfa3f [Memprof] Fix a warning
This patch fixes:

  llvm/lib/ProfileData/MemProfReader.cpp:685:1: error: non-void
  function does not return a value in all con trol paths
  [-Werror,-Wreturn-type]

While I am at it, this patch removes an else-after-return.
2024-06-26 12:05:58 -07:00
Matthew Weingarten
30b93db547
[Memprof] Adds the option to collect AccessCountHistograms for memprof. (#94264)
Adds compile time flag -mllvm -memprof-histogram and runtime flag
histogram=true|false to turn Histogram collection on and off. The
-memprof-histogram flag relies on -memprof-use-callbacks=true to work.

Updates shadow mapping logic in histogram mode from having one 8 byte
counter for 64 bytes, to 1 byte for 8 bytes, capped at 255. Only
supports this granularity as of now.

Updates the RawMemprofReader and serializing MemoryInfoBlocks to binary
format, including changing to a new version of the raw binary format
from version 3 to version 4.

Updates creating MemoryInfoBlocks with and without Histograms. When two
MemoryInfoBlocks are merged, AccessCounts are summed up and the shorter
Histogram is removed.

Adds a memprof_histogram test case.

Initial commit for adding AccessCountHistograms up until RawProfile for
memprof
2024-06-26 08:37:22 -07:00
Kazu Hirata
fef144cebb Revert "[llvm] Use llvm::sort (NFC) (#96434)"
This reverts commit 05d167fc201b4f2e96108be0d682f6800a70c23d.

Reverting the patch fixes the following under EXPENSIVE_CHECKS:

  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir
  LLVM :: CodeGen/PowerPC/aix-xcoff-used-with-stringpool.ll
  LLVM :: CodeGen/PowerPC/merge-string-used-by-metadata.mir
  LLVM :: CodeGen/PowerPC/mergeable-string-pool-large.ll
  LLVM :: CodeGen/PowerPC/mergeable-string-pool-pass-only.mir
  LLVM :: CodeGen/PowerPC/mergeable-string-pool.ll
2024-06-25 11:18:40 -07:00
Kazu Hirata
05d167fc20
[llvm] Use llvm::sort (NFC) (#96434) 2024-06-23 10:38:51 -07:00