llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-28 05:06:07 +00:00

Author	SHA1	Message	Date
Arseniy Zaostrovnykh	648e256e54	Reapply "[clang][analyzer] Stable order for SymbolRef-keyed containers" (#121749 ) Generalize the SymbolIDs used for SymbolData to all SymExprs and use these IDs for comparison SymbolRef keys in various containers, such as ConstraintMap. These IDs are superior to raw pointer values because they are more controllable and are not randomized across executions (unlike [pointers](https://en.wikipedia.org/wiki/Address_space_layout_randomization)). These IDs order is stable across runs because SymExprs are allocated in the same order. Stability of the constraint order is important for the stability of the analyzer results. I evaluated this change on a set of 200+ open-source C and C++ projects with the total number of ~78 000 symbolic-execution issues passing Z3 refutation. This patch reduced the run-to-run churn (flakiness) in SE issues from 80-90 to 30-40 (out of 78K) in our CSA deployment (in our setting flaky issues are mostly due to Z3 refutation instability). Note, most of the issue churn (flakiness) is caused by the mentioned Z3 refutation. With Z3 refutation disabled, issue churn goes down to ~10 issues out of 83K and this patch has no effect on appearing/disappearing issues between runs. It however, seems to reduce the volatility of the execution flow: before we had 40-80 issues with changed execution flow, after - 10-30. Importantly, this change is necessary for the next step in stabilizing analysis results by caching Z3 query outcomes between analysis runs (work in progress). Across our admittedly noisy CI runs, I detected no significant effect on memory footprint or analysis time. This PR reapplies https://github.com/llvm/llvm-project/pull/121551 with a fix to a g++ compiler error reported on some build bots CPP-5919	2025-01-06 12:45:31 +01:00
Balazs Benics	a106ad0f1d	Revert "[clang][analyzer] Stable order for SymbolRef-keyed containers" (#121592 ) Reverts llvm/llvm-project#121551 We had a bunch of build errors caused by this PR. https://lab.llvm.org/buildbot/#/builders/144/builds/14875	2025-01-03 19:43:24 +01:00
Arseniy Zaostrovnykh	0844f83fea	[clang][analyzer] Stable order for SymbolRef-keyed containers (#121551 ) Generalize the `SymbolID`s used for `SymbolData` to all `SymExpr`s and use these IDs for comparison `SymbolRef` keys in various containers, such as `ConstraintMap`. These IDs are superior to raw pointer values because they are more controllable and are not randomized across executions (unlike [pointers](https://en.wikipedia.org/wiki/Address_space_layout_randomization)). These IDs order is stable across runs because SymExprs are allocated in the same order. Stability of the constraint order is important for the stability of the analyzer results. I evaluated this change on a set of 200+ open-source C and C++ projects with the total number of ~78 000 symbolic-execution issues passing Z3 refutation. This patch reduced the run-to-run churn (flakiness) in SE issues from 80-90 to 30-40 (out of 78K) in our CSA deployment (in our setting flaky issues are mostly due to Z3 refutation instability). Note, most of the issue churn (flakiness) is caused by the mentioned Z3 refutation. With Z3 refutation disabled, issue churn goes down to ~10 issues out of 83K and this patch has no effect on appearing/disappearing issues between runs. It however, seems to reduce the volatility of the execution flow: before we had 40-80 issues with changed execution flow, after - 10-30. Importantly, this change is necessary for the next step in stabilizing analysis results by caching Z3 query outcomes between analysis runs (work in progress). Across our admittedly noisy CI runs, I detected no significant effect on memory footprint or analysis time. CPP-5919	2025-01-03 19:36:24 +01:00
Balazs Benics	8d43c880a5	Revert "[analyzer][Solver] Early return if sym is concrete on assuming" (#116362 ) Reverts llvm/llvm-project#115579 This introduced a breakage: https://lab.llvm.org/buildbot/#/builders/46/builds/7928	2024-11-15 10:28:40 +01:00
Ding Fei	4163136e2e	[analyzer][Solver] Early return if sym is concrete on assuming (#115579 ) This could deduce some complex syms derived from simple ones whose values could be constrainted to be concrete during execution, thus reducing some overconstrainted states. This commit also fix `unix.StdCLibraryFunctions` crash due to these overconstrainted states being added to the graph, which is marked as sink node (PosteriorlyOverconstrained). The 'assume' API is used in non-dual style so the checker should protectively test whether these newly added nodes are actually impossible. 1. The crash: https://godbolt.org/z/8KKWeKb86 2. The solver needs to solve equivalent: https://godbolt.org/z/ed8WqsbTh	2024-11-15 16:43:32 +08:00
Gabor Marton	20f8733d4b	[Analyzer][solver] Simplification: Do a fixpoint iteration before the eq class merge This reverts commit f02c5f3478318075d1a469203900e452ba651421 and addresses the issue mentioned in D114619 differently. Repeating the issue here: Currently, during symbol simplification we remove the original member symbol from the equivalence class (`ClassMembers` trait). However, we keep the reverse link (`ClassMap` trait), in order to be able the query the related constraints even for the old member. This asymmetry can lead to a problem when we merge equivalence classes: ``` ClassA: [a, b] // ClassMembers trait, a->a, b->a // ClassMap trait, a is the representative symbol ``` Now let,s delete `a`: ``` ClassA: [b] a->a, b->a ``` Let's merge ClassA into the trivial class `c`: ``` ClassA: [c, b] c->c, b->c, a->a ``` Now, after the merge operation, `c` and `a` are actually in different equivalence classes, which is inconsistent. This issue manifests in a test case (added in D103317): ``` void recurring_symbol(int b) { if (b * b != b) if ((b * b) * b * b != (b * b) * b) if (b * b == 1) } ``` Before the simplification we have these equivalence classes: ``` trivial EQ1: [b * b != b] trivial EQ2: [(b * b) * b * b != (b * b) * b] ``` During the simplification with `b * b == 1`, EQ1 is merged with `1 != b` `EQ1: [b * b != b, 1 != b]` and we remove the complex symbol, so `EQ1: [1 != b]` Then we start to simplify the only symbol in EQ2: `(b * b) * b * b != (b * b) * b --> 1 * b * b != 1 * b --> b * b != b` But `b * b != b` is such a symbol that had been removed previously from EQ1, thus we reach the above mentioned inconsistency. This patch addresses the issue by making it impossible to synthesise a symbol that had been simplified before. We achieve this by simplifying the given symbol to the absolute simplest form. Differential Revision: https://reviews.llvm.org/D114887	2021-12-01 22:23:41 +01:00
Gabor Marton	f02c5f3478	[Analyzer][solver] Do not remove the simplified symbol from the eq class Currently, during symbol simplification we remove the original member symbol from the equivalence class (`ClassMembers` trait). However, we keep the reverse link (`ClassMap` trait), in order to be able the query the related constraints even for the old member. This asymmetry can lead to a problem when we merge equivalence classes: ``` ClassA: [a, b] // ClassMembers trait, a->a, b->a // ClassMap trait, a is the representative symbol ``` Now lets delete `a`: ``` ClassA: [b] a->a, b->a ``` Let's merge the trivial class `c` into ClassA: ``` ClassA: [c, b] c->c, b->c, a->a ``` Now after the merge operation, `c` and `a` are actually in different equivalence classes, which is inconsistent. One solution to this problem is to simply avoid removing the original member and this is what this patch does. Other options I have considered: 1) Always merge the trivial class into the non-trivial class. This might work most of the time, however, will fail if we have to merge two non-trivial classes (in that case we no longer can track equivalences precisely). 2) In `removeMember`, update the reverse link as well. This would cease the inconsistency, but we'd loose precision since we could not query the constraints for the removed member. Differential Revision: https://reviews.llvm.org/D114619	2021-11-30 11:13:13 +01:00
Gabor Marton	806329da07	[analyzer][solver] Iterate to a fixpoint during symbol simplification with constants D103314 introduced symbol simplification when a new constant constraint is added. Currently, we simplify existing equivalence classes by iterating over all existing members of them and trying to simplify each member symbol with simplifySVal. At the end of such a simplification round we may end up introducing a new constant constraint. Example: ``` if (a + b + c != d) return; if (c + b != 0) return; // Simplification starts here. if (b != 0) return; ``` The `c == 0` constraint is the result of the first simplification iteration. However, we could do another round of simplification to reach the conclusion that `a == d`. Generally, we could do as many new iterations until we reach a fixpoint. We can reach to a fixpoint by recursively calling `State->assume` on the newly simplified symbol. By calling `State->assume` we re-ignite the whole assume machinery (along e.g with adjustment handling). Why should we do this? By reaching a fixpoint in simplification we are capable of discovering infeasible states at the moment of the introduction of the first constant constraint. Let's modify the previous example just a bit, and consider what happens without the fixpoint iteration. ``` if (a + b + c != d) return; if (c + b != 0) return; // Adding a new constraint. if (a == d) return; // This brings in a contradiction. if (b != 0) return; clang_analyzer_warnIfReached(); // This produces a warning. // The path is already infeasible... if (c == 0) // ...but we realize that only when we evaluate `c == 0`. return; ``` What happens currently, without the fixpoint iteration? As the inline comments suggest, without the fixpoint iteration we are doomed to realize that we are on an infeasible path only after we are already walking on that. With fixpoint iteration we can detect that before stepping on that. With fixpoint iteration, the `clang_analyzer_warnIfReached` does not warn in the above example b/c during the evaluation of `b == 0` we realize the contradiction. The engine and the checkers do rely on that either `assume(Cond)` or `assume(!Cond)` should be feasible. This is in fact assured by the so called expensive checks (LLVM_ENABLE_EXPENSIVE_CHECKS). The StdLibraryFuncionsChecker is notably one of the checkers that has a very similar assertion. Before this patch, we simply added the simplified symbol to the equivalence class. In this patch, after we have added the simplified symbol, we remove the old (more complex) symbol from the members of the equivalence class (`ClassMembers`). Removing the old symbol is beneficial because during the next iteration of the simplification we don't have to consider again the old symbol. Contrary to how we handle `ClassMembers`, we don't remove the old Sym->Class relation from the `ClassMap`. This is important for two reasons: The constraints of the old symbol can still be found via it's equivalence class that it used to be the member of (1). We can spare one removal and thus one additional tree in the forest of `ClassMap` (2). Performance and complexity: Let us assume that in a State we have N non-trivial equivalence classes and that all constraints and disequality info is related to non-trivial classes. In the worst case, we can simplify only one symbol of one class in each iteration. The number of symbols in one class cannot grow b/c we replace the old symbol with the simplified one. Also, the number of the equivalence classes can decrease only, b/c the algorithm does a merge operation optionally. We need N iterations in this case to reach the fixpoint. Thus, the steps needed to be done in the worst case is proportional to `N*N`. Empirical results (attached) show that there is some hardly noticeable run-time and peak memory discrepancy compared to the baseline. In my opinion, these differences could be the result of measurement error. This worst case scenario can be extended to that cases when we have trivial classes in the constraints and in the disequality map are transforming to such a State where there are only non-trivial classes, b/c the algorithm does merge operations. A merge operation on two trivial classes results in one non-trivial class. Differential Revision: https://reviews.llvm.org/D106823	2021-11-12 11:44:49 +01:00

8 Commits