llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-18 12:36:10 +00:00

Author	SHA1	Message	Date
Elena Demikhovsky	88e76cad16	Create masked gather and scatter intrinsics in Loop Vectorizer. Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access. The feature is enabled on AVX-512 target. Differential Revision: http://reviews.llvm.org/D15690 llvm-svn: 261140	2016-02-17 19:23:04 +00:00
Amaury Sechet	61a7d629ec	Fix load alignement when unpacking aggregates structs Summary: Store and loads unpacked by instcombine do not always have the right alignement. This explicitely compute the alignement and set it. Reviewers: dblaikie, majnemer, reames, hfinkel, joker.eph Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17326 llvm-svn: 261139	2016-02-17 19:21:28 +00:00
David Majnemer	f48bcb2bd9	Revert "Reapply commit r258404 with fix." This reverts commit r259357, it caused PR26629. llvm-svn: 261137	2016-02-17 19:02:36 +00:00
Frederic Riss	009d60650d	[ObjCARC] Handle ARCInstKind::ClaimRV in OptimizeIndividualCalls. When support for objc_unsafeClaimAutoreleasedReturnValue has been added to the ARC optimizer in r258970, one case was missed which would lead the optimizer to execute an llvm_unreachable. In this case, just handle ClaimRV in the same way we handle RetainRV. llvm-svn: 261134	2016-02-17 18:51:27 +00:00
Mehdi Amini	1db10ac6ce	Define the ThinLTO Pipeline (experimental) Summary: On the contrary to Full LTO, ThinLTO can afford to shift compile time from the frontend to the linker: both phases are parallel (even if it is not totally "free": projects like clang are reusing product from the "compile phase" for multiple link, think about libLLVMSupport reused for opt, llc, etc.). This pipeline is based on the proposal in D13443 for full LTO. We didn't move forward on this proposal because the LTO link was far too long after that. We believe that we can afford it with ThinLTO. The ThinLTO pipeline integrates in the regular O2/O3 flow: - The compile phase perform the inliner with a somehow lighter function simplification. (TODO: tune the inliner thresholds here) This is intendend to simplify the IR and get rid of obvious things like linkonce_odr that will be inlined. - The link phase will run the pipeline from the start, extended with some specific passes that leverage the augmented knowledge we have during LTO. Especially after the inliner is done, a sequence of globalDCE/globalOpt is performed, followed by another run of the "function simplification" passes. It is not clear if this part of the pipeline will stay as is, as the split model of ThinLTO does not allow the same benefit as FullLTO without added tricks. The measurements on the public test suite as well as on our internal suite show an overall net improvement. The binary size for the clang executable is reduced by 5%. We're still tuning it with the bringup of ThinLTO and it will evolve, but this should provide a good starting point. Reviewers: tejohnson Differential Revision: http://reviews.llvm.org/D17115 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 261029	2016-02-16 23:02:29 +00:00
Mehdi Amini	ec8bee1879	Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()" (NFC) It is intended to contains the passes run over a function after the inliner is done with a function and before it moves to its callers. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 261028	2016-02-16 22:54:27 +00:00
Junmo Park	6ebdc14cf1	[SCEVExpander] Make findExistingExpansion smarter Summary: Extending findExistingExpansion can use existing value in ExprValueMap. This patch gives 0.3~0.5% performance improvements on benchmarks(test-suite, spec2000, spec2006, commercial benchmark) Reviewers: mzolotukhin, sanjoy, zzheng Differential Revision: http://reviews.llvm.org/D15559 llvm-svn: 260938	2016-02-16 06:46:58 +00:00
Silviu Baranga	ec7063ac77	[LV] Add support for insertelt/extractelt processing during type truncation Summary: While shrinking types according to the required bits, we can encounter insert/extract element instructions. This will cause us to reach an llvm_unreachable statement. This change adds support for truncating insert/extract element operations, and adds a regression test. Reviewers: jmolloy Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17078 llvm-svn: 260893	2016-02-15 15:38:17 +00:00
Roman Gareev	036c08874a	Tweak the LICM code to reuse the first sub-loop instead of creating a new one LICM starts with an empty AST, and then merges in each sub-loop. While the add code is appropriate for sub-loop 2 and up, it's utterly unnecessary for sub-loop 1. If the AST starts off empty, we can just clone/move the contents of the subloop into the containing AST. Reviewed-by: Philip Reames <listmail@philipreames.com> Differential Revision: http://reviews.llvm.org/D16753 llvm-svn: 260892	2016-02-15 14:48:50 +00:00
Benjamin Kramer	8a752e316d	Use ArrayRef to hide SmallVector details, kill a useless vector copy along the way. llvm-svn: 260824	2016-02-13 16:01:12 +00:00
Chandler Carruth	632d208c78	[attrs] Move the norecurse deduction to operate on the node set rather than the SCC object, and have it scan the instruction stream directly rather than relying on call records. This makes the behavior of this routine consistent between libc routines and LLVM intrinsics for libc routines. We can go and start teaching it about those being norecurse, but we should behave the same for the intrinsic and the libc routine rather than differently. I chatted with James Molloy and the inconsistency doesn't seem intentional and likely is due to intrinsic calls not being modelled in the call graph analyses. This also fixes a bug where we would deduce norecurse on optnone functions, when generally we try to handle optnone functions as-if they were replaceable and thus unanalyzable. llvm-svn: 260813	2016-02-13 08:47:51 +00:00
Keno Fischer	7c7c3e3591	[Cloning] Clone every Function's Debug Info Summary: Export the CloneDebugInfoMetadata utility, which clones all debug info associated with a function into the first module. Also use this function in CloneModule on each function we clone (the CloneFunction entrypoint already does this). Without this, cloning a module will lead to DI quality regressions, especially since r252219 reversed the Function <-> DISubprogram edge (before we could get lucky and have this edge preserved if the DISubprogram itself was, e.g. due to location metadata). This was verified to fix missing debug information in julia and a unittest to verify the new behavior is included. Patch by Yichao Yu! Thanks! Reviewers: loladiro, pcc Differential Revision: http://reviews.llvm.org/D17165 llvm-svn: 260791	2016-02-13 02:04:29 +00:00
Chad Rosier	81362a8599	[LIR] Allow merging of memsets in negatively strided loops. Last part of PR25166. llvm-svn: 260732	2016-02-12 21:03:23 +00:00
Justin Lebar	6086c6a387	Fix typo in comment. llvm-svn: 260731	2016-02-12 21:01:37 +00:00
Justin Lebar	db63949e8d	[SimplifyCFG] Don't fold conditional branches that contain calls to convergent functions. Summary: Performing this optimization duplicates the call to the convergent function and adds new control-flow dependencies, which is a no-no. Reviewers: jingyue Subscribers: broune, hfinkel, tra, resistor, joker.eph, arsenm, llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17128 llvm-svn: 260730	2016-02-12 21:01:36 +00:00
Justin Lebar	df04d2a1f1	[LoopRotate] Don't perform loop rotation if the loop header calls a convergent function. Summary: Calls to convergent functions can be duplicated, but only if the duplicates are not control-flow dependent on any additional values. Loop rotation doesn't meet the bar. Reviewers: jingyue Subscribers: mzolotukhin, llvm-commits, arsenm, joker.eph, resistor, tra, hfinkel, broune Differential Revision: http://reviews.llvm.org/D17127 llvm-svn: 260729	2016-02-12 21:01:33 +00:00
David Majnemer	01674939b2	Remove unused variable llvm-svn: 260722	2016-02-12 20:33:51 +00:00
Philip Reames	96fccc2d09	[GVN] Common code for local and non-local load availability [NFCI] The attached patch removes all of the block local code for performing X-load forwarding by reusing the code used in the non-local case. The motivation here is to remove duplication and in the process increase our test coverage of some fairly tricky code. I have some upcoming changes I'll be proposing in this area and wanted to have the code cleaned up a bit first. Note: The review for this mostly happened in email which didn't make it to phabricator on the 258882 commit thread. Differential Revision: http://reviews.llvm.org/D16608 llvm-svn: 260711	2016-02-12 19:24:57 +00:00
Chad Rosier	4acff96646	[LIR] Partially revert r252926(NFC), which introduced a very subtle change. In short, before r252926 we were comparing an unsigned (StoreSize) against an a APInt (Stride), which is fine and well. After we were zero extending the Stride and then converting to an unsigned, which is not the same thing. Obviously, Stides can also be negative. This commit just restores the original behavior. AFAICT, it's not possible to write a test case to expose the issue because the code already has checks to make sure the StoreSize can't overflow an unsigned (which prevents the Stride from overflowing an unsigned as well). llvm-svn: 260706	2016-02-12 19:05:27 +00:00
David Majnemer	0f0abc7bc2	[InstCombine] Don't aggressively replace xor with icmp For some cases, InstCombine replaces the sequence of xor/sub instruction followed by cmp instruction into a single cmp instruction. However, this replacement may result suboptimal result especially when the xor/sub has more than one use, as discussed in bug 26465 (https://llvm.org/bugs/show_bug.cgi?id=26465). This patch make the replacement happen only when xor/sub has only one use. Differential Revision: http://reviews.llvm.org/D16915 Patch by Taewook Oh! llvm-svn: 260695	2016-02-12 18:12:38 +00:00
Chandler Carruth	3937bc70f9	[attrs] Simplify the convergent removal to directly use the pre-built node set rather than walking the SCC directly. This directly exposes the functions and has already had null entries filtered out. We also don't need need to handle optnone as it has already been handled in the caller -- we never try to remove convergent when there are optnone functions in the SCC. With this change, the code for removing convergent should work with the new pass manager and a different SCC analysis. llvm-svn: 260668	2016-02-12 09:47:49 +00:00
Chandler Carruth	057df3d423	[attrs] Consolidate the test for a non-SCC, non-convergent function call with the test for a non-convergent intrinsic call. While it is possible to use the call records to search for function calls, we're going to do an instruction scan anyways to find the intrinsics, we can handle both cases while scanning instructions. This will also make the logic more amenable to the new pass manager which doesn't use the same call graph structure. My next patch will remove use of CallGraphNode entirely and allow this code to work with both the old and new pass manager. Fortunately, it should also get strictly simpler without changing functionality. llvm-svn: 260666	2016-02-12 09:23:53 +00:00
Chandler Carruth	bbbbec0b54	[attrs] Run clang-format over a newly added routine in function-attrs before I update it to be friendly with the new pass manager. llvm-svn: 260653	2016-02-12 03:07:50 +00:00
Evgeniy Stepanov	ba6ca87ffb	[msan] Put msan constructor in a comdat. MSan adds a constructor to each translation unit that calls __msan_init, and does nothing else. The idea is to run __msan_init before any instrumented code. This results in multiple constructors and multiple .init_array entries in the final binary, one per translation unit. This is absolutely unnecessary; one would be enough. This change moves the constructors to a comdat group in order to drop the extra ones. llvm-svn: 260632	2016-02-12 00:37:52 +00:00
Matthew Simpson	a4e43c5b51	[SLP] Add debug output for extract cost (NFC) llvm-svn: 260614	2016-02-11 23:06:40 +00:00
Quentin Colombet	490cfbe2a2	Re-apply r238452, the bug was in clang and was fixed in r260567. Original commit message: [InstCombine] Fold IntToPtr and PtrToInt into preceding loads. Currently we only fold a BitCast into a Load when the BitCast is its only user. Do the same for any no-op cast. Patch by Philip Pfaffe! Differential Revision: http://reviews.llvm.org/D9152 llvm-svn: 260612	2016-02-11 22:30:41 +00:00
Mehdi Amini	9c1c3ac627	Revert "Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()"" This reverts commit r260603. I didn't intend to push it :( From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260607	2016-02-11 22:09:11 +00:00
Mehdi Amini	c5bf5ecc1b	Revert "Define the ThinLTO Pipeline" This reverts commit r260604. I didn't intend to push this now. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260606	2016-02-11 22:09:07 +00:00
Mehdi Amini	484470d605	Define the ThinLTO Pipeline Summary: On the contrary to Full LTO, ThinLTO can afford to shift compile time from the frontend to the linker: both phases are parallel. This pipeline is based on the proposal in D13443 for full LTO. We ] didn't move forward on this proposal because the link was far too long after that. This patch refactor the "function simplification" passes that are part of the inliner loop in a helper function (this part is NFC and can be commited separately to simplify the diff). The ThinLTO pipeline integrates in the regular O2/O3 flow: - The compile phase perform the inliner with a somehow lighter function simplification. (TODO: tune the inliner thresholds here) This is intendend to simplify the IR and get rid of obvious things like linkonce_odr that will be inlined. - The link phase will run the pipeline from the start, extended with some specific passes that leverage the augmented knowledge we have during LTO. Especially after the inliner is done, a sequence of globalDCE/globalOpt is performed, followed by another run of the "function simplification" passes. The measurements on the public test suite as well as on our internal suite show an overall net improvement. The binary size for the clang executable is reduced by 5%. We're still tuning it with the bringup of ThinLTO but this should provide a good starting point. Reviewers: tejohnson Subscribers: joker.eph, llvm-commits, dexonsmith Differential Revision: http://reviews.llvm.org/D17115 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260604	2016-02-11 22:00:31 +00:00
Mehdi Amini	f9a3718c5a	Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()" It is intended to contains the passes run over a function after the inliner is done with a function and before it moves to its callers. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260603	2016-02-11 22:00:25 +00:00
Pete Cooper	5562c333b8	Set load alignment on aggregate loads. When optimizing a extractvalue(load), we generate a load from the aggregate type. This load didn't have alignment set and so would get the alignment of the type. This breaks when the type is packed and so the alignment should be lower. For example, loading { int, int } would give us alignment of 4, but the original load from this type may have an alignment of 1 if packed. Reviewed by David Majnemer Differential revision: http://reviews.llvm.org/D17158 llvm-svn: 260587	2016-02-11 21:10:40 +00:00
Jun Bum Lim	10e58e867d	Fixed typo in r260530 llvm-svn: 260541	2016-02-11 16:46:13 +00:00
Jun Bum Lim	339e9723c1	[InstCombine] Simplify a known nonzero incoming value of PHI Summary: When a PHI is used only to be compared with zero, it is possible to replace an incoming value with any non-zero constant if the incoming value can be proved as a known nonzero value. For example, in below code, we can replace the incoming value %v with any non-zero constant based on the fact that the PHI is only used to be compared with zero and %v is a known non-zero value: %v = select %cond, 1, 2 %p = phi [%v, BB] ... %c = icmp eq, %p, 0 Reviewers: mcrosier, jmolloy, sanjoy Subscribers: hfinkel, mcrosier, majnemer, llvm-commits, haicheng, bmakam, mssimpso, gberry Differential Revision: http://reviews.llvm.org/D16240 llvm-svn: 260530	2016-02-11 15:50:07 +00:00
Tamas Berghammer	d12f31535a	Fix MSVC 2013 build after rL260504 llvm-svn: 260511	2016-02-11 11:27:51 +00:00
Artur Pilipenko	44e7c51b05	Don't propagate dereferenceable attribute through gc.relocate in InstCombine Reviewed By: reames Differential Revision: http://reviews.llvm.org/D16143 llvm-svn: 260509	2016-02-11 11:22:46 +00:00
Ashutosh Nema	2260a3a046	Fixed typo in comment & coding style for LoopVersioningLICM. llvm-svn: 260504	2016-02-11 09:23:53 +00:00
Teresa Johnson	41806854cc	Fix Windows bot failure in Transforms/FunctionImport/funcimport.ll Make sure we split ":" from the end of the global function id (which is <path>:<function> for local functions) instead of the beginning to avoid splitting at the wrong place for Windows file paths that contain a ":". llvm-svn: 260469	2016-02-10 23:47:38 +00:00
Mehdi Amini	4064174892	FunctionImport: add a progressive heuristic to limit importing too deep in the callgraph The current function importer will walk the callgraph, importing transitively any callee that is below the threshold. This can lead to import very deep which is costly in compile time and not necessarily beneficial as most of the inline would happen in imported function and not necessarilly in user code. The actual factor has been carefully chosen by flipping a coin ;) Some tuning need to be done (just at the existing limiting threshold). Reviewers: tejohnson Differential Revision: http://reviews.llvm.org/D17082 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260466	2016-02-10 23:31:45 +00:00
Mehdi Amini	c87d7d02e1	Use a StringSet in Internalize, and allow to create the pass from an existing one (NFC) There is not reason to pass an array of "char *" to rebuild a set if the client already has one. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260462	2016-02-10 23:24:31 +00:00
Philip Reames	d59638e14f	Follow up to 260439, Speculative fix to clang builders It looks like clang has a couple of test cases which caught the fact LVI was not slightly more precise after 260439. When looking at the failures, it struck me as wasteful to be querying nullness of a constant via LVI, so instead of tweaking the clang tests, let's just stop querying constants from this source. llvm-svn: 260451	2016-02-10 22:22:41 +00:00
Teresa Johnson	e1164de5d0	Restore "[ThinLTO] Use MD5 hash in function index." with fix This restores commit r260408, along with a fix for a bot failure. The bot failure was caused by dereferencing a unique_ptr in the same call instruction parameter list where it was passed via std::move. Apparently due to luck this was not exposed when I built the compiler with clang, only with gcc. llvm-svn: 260442	2016-02-10 21:55:02 +00:00
Teresa Johnson	89f38fb5cc	Revert "[ThinLTO] Use MD5 hash in function index." due to bot failure This reverts commit r260408. Bot failure that I need to investigate. llvm-svn: 260412	2016-02-10 19:11:15 +00:00
Teresa Johnson	0919a84071	[ThinLTO] Use MD5 hash in function index. Summary: This patch uses the lower 64-bits of the MD5 hash of a function name as a GUID in the function index, instead of storing function names. Any local functions are first given a global name by prepending the original source file name. This is the same naming scheme and GUID used by PGO in the indexed profile format. This change has a couple of benefits. The primary benefit is size reduction in the combined index file, for example 483.xalancbmk's combined index file was reduced by around 70%. It should also result in memory savings for the index file in memory, as the in-memory map is also indexed by the hash instead of the string. Second, this enables integration with indirect call promotion, since the indirect call profile targets are recorded using the same global naming convention and hash. This will enable the function importer to easily locate function summaries for indirect call profile targets to enable their import and subsequent promotion. The original source file name is recorded in the bitcode in a new module-level record for use in the ThinLTO backend pipeline. Reviewers: davidxl, joker.eph Subscribers: llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D17028 llvm-svn: 260408	2016-02-10 18:57:54 +00:00
Rong Xu	13b01dc8d9	[PGO] Indirect-call profile annotation in IR level profiling This patch reads the indirect-call value records in the profile and makes the annotation in the indirect-call instruction. This is for IR level profile instrumentation. Differential Revision: http://reviews.llvm.org/D16935 llvm-svn: 260400	2016-02-10 18:24:45 +00:00
Teresa Johnson	488a800a4c	[ThinLTO] Move global processing from Linker to TransformUtils (NFC) Summary: As discussed on IRC, move the ThinLTOGlobalProcessing code out of the linker, and into TransformUtils. The name of the class is changed to FunctionImportGlobalProcessing. Reviewers: joker.eph, rafael Subscribers: joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D17081 llvm-svn: 260395	2016-02-10 18:11:31 +00:00
Daniel Berlin	f6c9ae9c6d	Rename a member variable to be more accurate with how it is used llvm-svn: 260389	2016-02-10 17:41:25 +00:00
Daniel Berlin	932b4cbf5d	Constify two functions, make them accessible to unit tests llvm-svn: 260387	2016-02-10 17:39:43 +00:00
Rong Xu	33c76c0cc2	[PGO] Differentiate Clang instrumentation and IR level instrumentation profiles This patch uses one bit in profile version to differentiate Clang instrumentation and IR level instrumentation profiles. PGOInstrumenation generates a COMDAT variable __llvm_profile_raw_version so that the compiler runtime can set the right profile kind. For Maco-O platform, we generate the variable as linkonce_odr linkage as COMDAT is not supported. PGOInstrumenation now checks this bit to make sure it's an IR level instrumentation profile. The patch was submitted as r260164 but reverted due to a Darwin test breakage. Original Differential Revision: http://reviews.llvm.org/D15540 Differential Revision: http://reviews.llvm.org/D17020 llvm-svn: 260385	2016-02-10 17:18:30 +00:00
Tom Stellard	6fa4e290d7	StructurizeCFG: Initialize SkipUniformRegions in the default constructor This should fix some random bot failures caused by r260336. llvm-svn: 260342	2016-02-10 01:10:09 +00:00
Tom Stellard	755a4e6b57	StructurizeCFG: Add an option for skipping regions with only uniform branches Summary: Tests for this will be added once the AMDGPU backend enables this option. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16602 llvm-svn: 260336	2016-02-10 00:39:37 +00:00

1 2 3 4 5 ...

14447 Commits