llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-01 05:36:09 +00:00

Author	SHA1	Message	Date
Teresa Johnson	fdb050a502	Revert "[MemProf] Use radix tree for alloc contexts in bitcode summaries" (#117395 ) Reverts llvm/llvm-project#117066 This is causing some build bot failures that need investigation.	2024-11-22 14:57:58 -08:00
Teresa Johnson	ccb4702038	[MemProf] Use radix tree for alloc contexts in bitcode summaries (#117066 ) Leverage the support added to represent allocation contexts in a more compact way via a radix tree in the indexed profile to similarly reduce sizes of the bitcode summaries. For a large target, this reduced the size of the per-module summaries by about 18% and in the distributed combined index files by 28%.	2024-11-22 14:49:55 -08:00
Kazu Hirata	ad2bdd8fab	[memprof] Remove MemProf format Version 1 (#117357 ) This patch removes MemProf format Version 1 now that Version 2 and 3 are working well.	2024-11-22 11:53:31 -08:00
Teresa Johnson	e14827f082	[MemProf] Templatize CallStackRadixTreeBuilder (NFC) (#117014 ) Prepare for usage in the bitcode reader/writer where we already have a LinearFrameId: - templatize input frame id type in CallStackRadixTreeBuilder - templatize input frame id type in computeFrameHistogram - make the map from FrameId to LinearFrameId optional We plan to use the same radix format in the ThinLTO summary records, where we already have a LinearFrameId.	2024-11-20 10:08:58 -08:00
Kazu Hirata	4f1b20f023	[ProfileData] Remove unused includes (NFC) (#116751 ) Identified with misc-include-cleaner.	2024-11-19 19:42:20 -08:00
Kazu Hirata	f97c610d1f	[memprof] Add MemProfReader::takeMemProfData (#116769 ) This patch adds MemProfReader::takeMemProfData, a function to return the complete MemProf profile from the reader. We can directly pass its return value to InstrProfWriter::addMemProfData without having to deal with the indivual components of the MemProf profile. The new function is named "take", but it doesn't do std::move yet because of type differences (DenseMap v.s. MapVector). The end state I'm trying to get to is roughly as follows: - MemProfReader accepts IndexedMemProfData as a parameter as opposed to the three individual components (frames, call stacks, and records). - MemProfReader keeps IndexedMemProfData as a class member without decomposing it into its individual components. - MemProfReader returns IndexedMemProfData like: IndexedMemProfData takeMemProfData() { return std::move(MemProfData); }	2024-11-19 19:33:26 -08:00
Kazu Hirata	6bf8f08989	[memprof] Add InstrProfWriter::addMemProfData (#116528 ) This patch adds InstrProfWriter::addMemProfData, which adds the complete MemProf profile (frames, call stacks, and records) to the writer context. Without this function, functions like loadInput in llvm-profdata.cpp and InstrProfWriter::mergeRecordsFromWriter must add one item (frame, call stack, or record) at a time. The new function std::moves the entire MemProf profile to the writer context if the destination is empty, which is the common use case. Otherwise, we fall back to adding one item at a time behind the scene. Here are a couple of reasons why we should add this function: - We've had a bug where we forgot to add one of the three data structures (frames, call stacks, and records) to the writer context, resulting in a nearly empty indexed profile. We should always package the three data structures together, especially on API boundaries. - We expose a little too much of the MemProf detail to InstrProfWriter. I'd like to gradually transform InstrProfReader/Writer to entities managing buffers (sequences of bytes), with actual serialization/deserialization left to external classes. We already do some of this in InstrProfReader, where InstrProfReader "contracts out" to IndexedMemProfReader to handle MemProf details. I am not changing loadInput or InstrProfWriter::mergeRecordsFromWriter for now because MemProfReader uses DenseMap for frames and call stacks, whereas MemProfData uses MapVector. I'll resolve these mismatches in subsequent patches.	2024-11-18 08:56:25 -08:00
Kazu Hirata	0d38f64e7d	[memprof] Remove MemProf format Version 0 (#116442 ) This patch removes MemProf format Version 0 now that version 2 and 3 seem to be working well. I'm not touching version 1 for now because some tests still rely on version 1. Note that Version 0 is identical to Version 1 except that the MemProf section of the indexed format has a MemProf version field.	2024-11-15 15:37:00 -08:00
Kazu Hirata	57ed628fb3	[memprof] Speed up caller-callee pair extraction (Part 2) (#116441 ) This patch further speeds up the extraction of caller-callee pairs from the profile. Recall that we reconstruct a call stack by traversing the radix tree from one of its leaf nodes toward a root. The implication is that when we decode many different call stacks, we end up visiting nodes near the root(s) repeatedly. That in turn adds many duplicates to our data structure: DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> Calls; only to be deduplicated later with sort+unique for each vector. This patch makes the extraction process more efficient by keeping track of indices of the radix tree array we've visited so far and terminating traversal as soon as we encounter an element previously visited. Note that even with this improvement, we still add at least one caller-callee pair to the data structure above for each call stack because we do need to add a caller-callee pair for the leaf node with the callee GUID being 0. Without this patch, it takes 4 seconds to extract caller-callee pairs from a large MemProf profile. This patch shortenes that down to 900ms.	2024-11-15 15:33:23 -08:00
Kazu Hirata	ec353b7418	[memprof] Use llvm::function_ref instead of std::function (#116306 ) We've seen bugs where we lost track of error states stored in the functor because we passed the functor by value (that is, std::function) as opposed to reference (llvm::function_ref). This patch fixes a couple of places we pass functors by value. While we are at it, this patch adds curly braces around a "for" loop spanning multiple lines.	2024-11-15 13:03:24 -08:00
Kazu Hirata	59da1afd2a	[memprof] Speed up caller-callee pair extraction (#116184 ) We know that the MemProf profile has a lot of duplicate call stacks. Extracting caller-callee pairs from a call stack we've seen before is a wasteful effort. This patch makes the extraction more efficient by first coming up with a work list of linear call stack IDs -- the set of starting positions in the radix tree array -- and then extract caller-callee pairs from each call stack in the work list. We implement the work list as a bit vector because we expect the work list to be dense in the range [0, RadixTreeSize). Also, we want the set insertion to be cheap. Without this patch, it takes 25 seconds to extract caller-callee pairs from a large MemProf profile. This patch shortenes that down to 4 seconds.	2024-11-14 15:54:55 -08:00
Kazu Hirata	9a730d878e	[memprof] Add IndexedMemProfReader::getMemProfCallerCalleePairs (#115807 ) Undrifting the MemProf profile requires two sets of information: - caller-callee pairs from the profile - callee-callee pairs from the IR This patch adds a function to do the former. The latter has been addressed by extractCallsFromIR. Unfortunately, the current MemProf format does not directly give us the caller-callee pairs from the profile. "struct Frame" just tells us where the call site is -- Caller GUID and line/column numbers; it doesn't tell us what function a given Frame is calling. To extract caller-callee pairs, we need to scan each call stack, look at two adjacent Frames, and extract a caller-callee pair. Conceptually, we would extract caller-callee pairs with: for each MemProfRecord in the profile: for each call stack in AllocSites: extract caller-callee pairs from adjacent pairs of Frames However, this is highly inefficient. Obtaining MemProfRecord involves looking up the OnDiskHashTable, allocating several vectors on the heap, and populating fields that are irrelevant to us, such as MIB and CallSites. This patch adds an efficient way of doing the above. Specifically, we - go though all IndexedMemProfRecords, - look at each linear call stack ID - extract caller-callee pairs from each call stack The extraction is done by a new class CallerCalleePairExtractor, modified from LinearCallStackIdConverter, which reconstructs a call stack from the radix tree array. For our purposes, we skip the reconstruction and immediately populates the data structure for caller-callee pairs. The resulting caller-callee-pairs is of the type: DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> CallerCalleePairs; which can be passed directly to longestCommonSequence just like the result of extractCallsFromIR. Further performance optimizations are possible for the new functions in this patch. I'll address those in follow-up patches.	2024-11-13 23:40:12 -08:00
Teresa Johnson	594e11ce42	[MemProf] Avoid incorrect ICP symtab canonicalization (#115419 ) ICP builds a symtab from the symbols in the module allowing mapping from the VP metadata GUIDs to the Function. MemProf uses this same symtab handling for its ICP during cloning. When symbols are added to the symtab, the handling adds both a GUID computed from the function name, or from the attached PGOFuncName metadata for locals, as well as a GUID computed from the "canonicalized" name, which strips all "." suffixes other than ".__uniq". This was originally meant to remove the ".llvm.*" suffix added to promoted locals (done earlier in the ThinLTO backend). In theory, it should no longer be needed as locals should have PGOFuncName metadata. However, this was causing a linker unsat, in code that used coroutines. For an original coroutine function, there were several additional functions created that had the same name, but different "." suffixes. Therefore the canonical name for these additional functions had the same GUID as that of the original function, leading to extra entries in the symtab, and to selecting the wrong function for promotion. For regular ICP this can happen, but is just a performance issue. However, for memprof the promoted direct call calls a memprof clone, and because we called the wrong function, in this case it didn't have a memprof clone and we got a linker unsat. We may be able to remove the canonical name handling for ICP in general, but for now disable it for MemProf. At worst this could lead to not finding a GUID in the symtab and not performing an ICP, so should be conservatively correct.	2024-11-07 21:00:42 -08:00
Teresa Johnson	475e736bb5	[MemProf] Include <ctime> to avoid MSVC failure (#114246 ) My change in bb3915149a7c9b1660db9caebfc96343352e8454 added a call to std::time which worked generally as there must be some transitive include of <ctime>. However, I saw one MSVC bot failure: InstrProfWriter.cpp(202): error C2039: 'time': is not a member of 'std' from https://lab.llvm.org/buildbot/#/builders/63/builds/2325. Presumably explictly including <ctime> should fix this.	2024-10-30 08:28:22 -07:00
Teresa Johnson	bb3915149a	[MemProf] Support for random hotness when writing profile (#113998 ) Add support for generating random hotness in the memprof profile writer, to be used for testing. The random seed is printed to stderr, and an additional option enables providing a specific seed in order to reproduce a particular random profile.	2024-10-29 22:10:33 -07:00
Yuta Saito	bf8f5cc9b5	Reland: [llvm-cov][WebAssembly] Read `__llvm_prf_names` from data segments (#112569 ) On WebAssembly, most coverage metadata contents read by llvm-cov (like `__llvm_covmap` and `__llvm_covfun`) are stored in custom sections because they are not referenced at runtime. However, `__llvm_prf_names` is referenced at runtime by the profile runtime library and is read by llvm-cov post-processing tools, so it needs to be stored in a data segment, which is allocatable at runtime and accessible by tools as long as "name" section is present in the binary. This patch changes the way llvm-cov reads `__llvm_prf_names` on WebAssembly. Instead of looking for a section, it looks for a data segment with the same name. This reverts commit 157f10ddf2d851125a85a71e530dc9d50cb032a2 and fixes PE/COFF `.lprfn$A` section handling.	2024-10-24 12:52:50 +09:00
NAKAMURA Takumi	4a011ac84f	[Coverage] Introduce "partial fold" on BranchRegion (#112694 ) Currently both True/False counts were folded. It lost the information, "It is True or False before folding." It prevented recalling branch counts in merging template instantiations. In `llvm-cov`, a folded branch is shown as: - `[True: n, Folded]` - `[Folded, False n]` In the case If `n` is zero, a branch is reported as "uncovered". This is distinguished from "folded" branch. When folded branches are merged, `Folded` may be dissolved. In the coverage map, either `Counter` is `Zero`. Currently both were `Zero`. Since "partial fold" has been introduced, either case in `switch` is omitted as `Folded`. Each `case:` in `switch` is reported as `[True: n, Folded]`, since `False` count doesn't show meaningful value. When `switch` doesn't have `default:`, `switch (Cond)` is reported as `[Folded, False: n]`, since `True` count was just the sum of `case`(s). `switch` with `default` can be considered as "the statement that doesn't have any `False`(s)".	2024-10-20 12:30:35 +09:00
Yuta Saito	157f10ddf2	Revert "[llvm-cov][WebAssembly] Read `__llvm_prf_names` from data segments" (#112520 ) This reverts commit efc9dd4118a7ada7d8c898582f16db64827f7ce0 in order to fix Windows test failure: https://github.com/llvm/llvm-project/pull/111332#issuecomment-2416462512	2024-10-16 21:42:54 +09:00
Yuta Saito	d4efc3e097	[Coverage][WebAssembly] Add initial support for WebAssembly/WASI (#111332 ) Currently, WebAssembly/WASI target does not provide direct support for code coverage. This patch set fixes several issues to unlock the feature. The main changes are: 1. Port `compiler-rt/lib/profile` to WebAssembly/WASI. 2. Adjust profile metadata sections for Wasm object file format. - [CodeGen] Emit `__llvm_covmap` and `__llvm_covfun` as custom sections instead of data segments. - [lld] Align the interval space of custom sections at link time. - [llvm-cov] Copy misaligned custom section data if the start address is not aligned. - [llvm-cov] Read `__llvm_prf_names` from data segments 3. [clang] Link with profile runtime libraries if requested See each commit message for more details and rationale. This is part of the effort to add code coverage support in Wasm target of Swift toolchain.	2024-10-15 02:41:43 +09:00
Kazu Hirata	02b9c97b75	[memprof] Simplify code with MapVector::operator[] (NFC) (#111335 ) Note that the following are all equivalent to each other: Map.insert({Key, Value()}).first->second Map.try_emplace(Key).first->second Map[Key]	2024-10-07 09:00:05 -07:00
Kazu Hirata	abaa8247e8	[memprof] Avoid repeated hash lookups (NFC) (#110789 )	2024-10-02 06:53:11 -07:00
Kazu Hirata	0089f39e0f	[ProfileData] Avoid repeated hash lookups (NFC) (#110619 )	2024-10-01 00:30:04 -07:00
Kazu Hirata	e4e3ff5adc	[llvm] Use std::optional::value_or (NFC) (#109568 )	2024-09-22 01:00:24 -07:00
gulfemsavrun	536bdc99e6	[Coverage] Skip empty profile name section (#108480 ) llvm-cov reads __llvm_prf_names section in an object file to find the profile names, and it instead reads __llvm_covnames section in binary profile correlation mode when __llvm_prf_names section is omitted. This patch ensures that it still reads __llvm_covnames section when there is an empty __llvm_prf_names section.	2024-09-13 16:41:07 -07:00
Mircea Trofin	885ac29910	[nfc][ctx_prof] Change some internal "set" types - the set used for targets under a callsite is simpler to use if iterators are stable (it gets manipulated during updates) - the set used to fetch the transitive closure of GUIDs under a node can be left as a choice to the user.	2024-09-12 10:34:53 -07:00
Zequan Wu	a7c26aaf2e	Revert "[Coverage] Ignore unused functions if the count is 0." (#107901 ) Reverts llvm/llvm-project#107661 Breaks llvm-project/llvm/unittests/ProfileData/CoverageMappingTest.cpp	2024-09-09 14:34:13 -04:00
Zequan Wu	6850410562	[Coverage] Ignore unused functions if the count is 0. (#107661 ) Relax the condition to ignore the case when count is 0. This fixes a bug on `381e9d2386`. This was reported at https://discourse.llvm.org/t/coverage-from-multiple-test-executables/81024/.	2024-09-09 14:14:21 -04:00
gulfemsavrun	787cd8f0fe	[InstrProf] Add debuginfod correlation support (#106606 ) This patch adds debuginfod support into llvm-profdata to find the assosicated executable by a build id in a raw profile to correlate a profile with a provided correlation kind (debug-info or binary).	2024-09-06 13:28:23 -07:00
William Junda Huang	75e9d191f5	[llvm-profdata] Enabled functionality to write split-layout profile (#101795 ) Using the flag `-split_layout` in llvm-profdata merge, the output profile can write profiles with and without inlined function into two different extbinary sections (and their FuncOffsetTable too). The section without inlined functions are marked with `SecFlagFlat` and is skipped by ThinLTO because it provides no useful info. The split layout feature was already implemented in SampleProfWriter but previously there is no way to use it from llvm-profdata.	2024-08-28 20:33:54 -04:00
Mircea Trofin	1022323c9b	[ctx_prof] Move the "from json" logic more centrally to reuse it from test. (#106129 ) Making the synthesis of a contextual profile file from a JSON descriptor more reusable, for unittest authoring purposes. The functionality round-trips through the binary format - no reason, currently, to support other ways of loading contextual profiles.	2024-08-27 15:43:05 -07:00
Lei Wang	23144e87d2	[SampleFDO][NFC] Refactoring sample reader to support on-demand read profiles for given functions (#104654 ) Currently in extended binary format, sample reader only read the profiles when the function are in the current module at initialization time, this extends the support to read the arbitrary profiles for given input functions in later stage. It's used for https://github.com/llvm/llvm-project/pull/101053.	2024-08-27 11:56:24 -07:00
Ethan Luis McDonough	fde2d23ee2	[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587 ) (#102691 ) This pull request is a revised version of #76587. This pull request fixes some build issues that were present in the previous version of this change. > This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: > > - Adds blank registration functions to device RTL > - Gives PGO globals protected visibility when targeting a supported GPU > - Handles any addrspace casts for PGO calls > - Implements PGO global extraction in GPU plugins (currently only dumps info) > > These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-08-22 01:10:54 -05:00
Kazu Hirata	dca820951c	[llvm] Use llvm::any_of (NFC) (#104443 )	2024-08-15 17:59:10 -07:00
Mircea Trofin	6b47772a4b	[nfc][ctx_prof] Rename `PGOContextualProfile` to `PGOCtxProfContext` (#102209 )	2024-08-06 17:41:38 -04:00
Mircea Trofin	cc7308a156	[ctx_prof] Make the profile output analyzable by llvm-bcanalyzer (#99563 ) This requires output-ing a "Magic" 4-byte header. We also emit a block info block, to describe our blocks and records. The output of `llvm-bcanalyzer` would look like: ``` <BLOCKINFO_BLOCK/> <Metadata NumWords=17 BlockCodeSize=2> <Version op0=1/> <Context NumWords=13 BlockCodeSize=2> <GUID op0=2/> <Counters op0=1 op1=2 op2=3/> ``` Instead of having `Unknown` for block and record IDs.	2024-07-23 08:59:07 -04:00
Lei Wang	18cdfa72e0	[SampleFDO] Stale profile call-graph matching (#95135 ) Profile staleness could be due to function renaming. Given that sample profile loader relies on exact string matching, a trivial change in the function signature( such as `int foo()` --> `long foo()` ) can make the mangled name different, the function profile(including all nested children profile) becomes unavailable. This patch introduces stale profile call-graph level matching, targeting at identifying the trivial function renaming and reusing the old function profile. Some noteworthy details: 1. Extend the LCS based CFG level matching to identify new function. - Extend to match function and profile have different name instead of the exact function name matching. This leverages LCS, i.e during the finding of callsite anchor matching, when two function name are different, try matching the functions instead of return. - In LCS, the equal function check is replaced by `functionMatchesProfile`. - Only try matching functions that are new functions(neither appears on each side). This reduces the matching scope as we don't need to match the originally matched function. 2. Determine the matching by call-site anchor similarity check. - A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to check the renaming for the possible <IRFunc, ProfFunc> pair, use the LCS(diff) matching to compute the equal set and we define: `Similarity = \|equalSet * 2\| / (\|A\| + \|B\|)`. The profile name is marked as renamed if the similarity is above a threshold(`-func-profile-similarity-threshold`) 3. Process the matching in top-down function order - when a caller's is done matching, the new function names are saved for later use, using top-down order will maximize the reused results. - `ProfileNameToFuncMap` is used to save or cache the matching result. 4. Update the original profile at the end using `ProfileNameToFuncMap`. 5. Added a new switch --salvage-unused-profile to control this, default is false. Verified on one Meta's internal big service, confirmed 90%+ of the found renaming pair is good. (There could be incorrect renaming pair if the num of the anchor is small, but checked that those functions are simple cold function)	2024-07-17 10:33:00 -07:00
Kazu Hirata	6c8ff4cbb8	[ProfileData] Take ArrayRef<InstrProfValueData> in addValueData (NFC) (#97363 ) This patch fixes another place in ProfileData where we have a pointer to an array of InstrProfValueData and its length separately. addValueData is a bit unique in that it remaps incoming values in place before adding them to ValueSites. AFAICT, no caller of addValueData uses updated incoming values. With this patch, we add value data to ValueSites first and then remaps values there. This way, we can take ArrayRef<InstrProfValueData> as a parameter.	2024-07-11 16:38:44 -07:00
Mircea Trofin	afbd7d1e7c	[NFC] Coding style: drop `k` in `kGlobalIdentifierDelimiter` (#98230 )	2024-07-09 15:44:55 -07:00
Mircea Trofin	ce03155a1b	[NFC] Coding style fixes: SampleProf (#98208 ) Also some control flow simplifications. Notably, this doesn't address `sampleprof_error`. I think the style there tries to match `std::error_category`. Also left `hash_value` as-is, because it matches what we do in Hashing.h	2024-07-09 14:35:49 -07:00
Mircea Trofin	e291f31f89	[NFC] Coding style fixes in InstrProf.cpp (#98211 )	2024-07-09 13:28:35 -07:00
Kazu Hirata	0cfd03ac0d	[ProfileData] Use ArrayRef in PatchItem (NFC) (#97379 ) Packaging an array and its size as ArrayRef in PatchItem allows us to get rid of things like std::size(Header) and HeaderOffsets.size().	2024-07-02 22:58:26 -07:00
Kazu Hirata	b8eaa5bb10	[ProfileData] Remove the old version of getValueProfDataFromInst (#97374 ) I've migrated uses of the old version of getValueProfDataFromInst to the one that returns SmallVector<InstrProfValueData, 4>. This patch removes the old version.	2024-07-02 11:46:31 -07:00
Mingming Liu	1518b260ce	[TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (#81442 ) Clang's `-fwhole-program-vtables` is required for this optimization to take place. If `-fwhole-program-vtables` is not enabled, this change is no-op. * Function-comparison (before): ``` %vtable = load ptr, ptr %obj %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1 %func = load ptr, ptr %vfn %cond = icmp eq ptr %func, @callee br i1 %cond, label bb1, label bb2: bb1: call @callee bb2: call %func ``` * VTable-comparison (after): ``` %vtable = load ptr, ptr %obj %cond = icmp eq ptr %vtable, @vtable-address-point br i1 %cond, label bb1, label bb2: bb1: call @callee bb2: %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1 %func = load ptr, ptr %vfn call %func ``` Key changes: 1. Find out virtual calls and the vtables they come from. - The ICP relies on type intrinsic `llvm.type.test` to find out virtual calls and the compatible vtables, and relies on type metadata to find the address point for comparison. 2. ICP pass does cost-benefit analysis and compares vtable only when the number of vtables for a function candidate is within (option specified) threshold. 3. Sink the function addressing and vtable load instruction to indirect fallback. - The sink helper functions are simplified versions of `InstCombinerImpl::tryToSinkInstruction`. Currently debug intrinsics are not handled. Ideally `InstCombinerImpl::tryToSinkInstructionDbgValues` and `InstCombinerImpl::tryToSinkInstructionDbgVariableRecords` could be moved into Transforms/Utils/Local.cpp (or another util cpp file) to handle debug intrinsics when moving instructions across basic blocks. 4. Keep value profiles updated 1) Update vtable value profiles after inline 2) For either function-based comparison or vtable-based comparison, update both vtable and indirect call value profiles.	2024-06-29 23:21:33 -07:00
Ethan Luis McDonough	2c8b912f63	Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587 )" This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.	2024-06-28 12:30:45 -05:00
Ethan Luis McDonough	5fd2af38e4	[PGO][OpenMP] Instrumentation for GPU devices (#76587 ) This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: - Adds blank registration functions to device RTL - Gives PGO globals protected visibility when targeting a supported GPU - Handles any addrspace casts for PGO calls - Implements PGO global extraction in GPU plugins (currently only dumps info) These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-06-28 10:42:19 -05:00
Matthew Weingarten	ca4e5a8d6e	[Memprof] Fixes memory leak in MemInfoBlock histogram. (#96834 ) MemInfoBlocks (MIB) with empty callstacks are erased prematurely from the CallStackProfileData. This patch frees allocated histogram buffers when the MIB is associated with an empty callstack.	2024-06-26 17:11:21 -07:00
Kazu Hirata	22b36bfa3f	[Memprof] Fix a warning This patch fixes: llvm/lib/ProfileData/MemProfReader.cpp:685:1: error: non-void function does not return a value in all con trol paths [-Werror,-Wreturn-type] While I am at it, this patch removes an else-after-return.	2024-06-26 12:05:58 -07:00
Matthew Weingarten	30b93db547	[Memprof] Adds the option to collect AccessCountHistograms for memprof. (#94264 ) Adds compile time flag -mllvm -memprof-histogram and runtime flag histogram=true\|false to turn Histogram collection on and off. The -memprof-histogram flag relies on -memprof-use-callbacks=true to work. Updates shadow mapping logic in histogram mode from having one 8 byte counter for 64 bytes, to 1 byte for 8 bytes, capped at 255. Only supports this granularity as of now. Updates the RawMemprofReader and serializing MemoryInfoBlocks to binary format, including changing to a new version of the raw binary format from version 3 to version 4. Updates creating MemoryInfoBlocks with and without Histograms. When two MemoryInfoBlocks are merged, AccessCounts are summed up and the shorter Histogram is removed. Adds a memprof_histogram test case. Initial commit for adding AccessCountHistograms up until RawProfile for memprof	2024-06-26 08:37:22 -07:00
Kazu Hirata	fef144cebb	Revert "[llvm] Use llvm::sort (NFC) (#96434 )" This reverts commit 05d167fc201b4f2e96108be0d682f6800a70c23d. Reverting the patch fixes the following under EXPENSIVE_CHECKS: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir LLVM :: CodeGen/PowerPC/aix-xcoff-used-with-stringpool.ll LLVM :: CodeGen/PowerPC/merge-string-used-by-metadata.mir LLVM :: CodeGen/PowerPC/mergeable-string-pool-large.ll LLVM :: CodeGen/PowerPC/mergeable-string-pool-pass-only.mir LLVM :: CodeGen/PowerPC/mergeable-string-pool.ll	2024-06-25 11:18:40 -07:00
Kazu Hirata	05d167fc20	[llvm] Use llvm::sort (NFC) (#96434 )	2024-06-23 10:38:51 -07:00

1 2 3 4 5 ...

1101 Commits