mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-27 03:36:06 +00:00
698 lines
30 KiB
ReStructuredText
698 lines
30 KiB
ReStructuredText
![]() |
==========================
|
||
|
Convergence And Uniformity
|
||
|
==========================
|
||
|
|
||
|
.. contents::
|
||
|
:local:
|
||
|
|
||
|
Introduction
|
||
|
============
|
||
|
|
||
|
Some parallel environments execute threads in groups that allow
|
||
|
communication within the group using special primitives called
|
||
|
*convergent* operations. The outcome of a convergent operation is
|
||
|
sensitive to the set of threads that executes it "together", i.e.,
|
||
|
convergently.
|
||
|
|
||
|
A value is said to be *uniform* across a set of threads if it is the
|
||
|
same across those threads, and *divergent* otherwise. Correspondingly,
|
||
|
a branch is said to be a uniform branch if its condition is uniform,
|
||
|
and it is a divergent branch otherwise.
|
||
|
|
||
|
Whether threads are *converged* or not depends on the paths they take
|
||
|
through the control flow graph. Threads take different outgoing edges
|
||
|
at a *divergent branch*. Divergent branches constrain
|
||
|
program transforms such as changing the CFG or moving a convergent
|
||
|
operation to a different point of the CFG. Performing these
|
||
|
transformations across a divergent branch can change the sets of
|
||
|
threads that execute convergent operations convergently. While these
|
||
|
constraints are out of scope for this document, the described
|
||
|
*uniformity analysis* allows these transformations to identify
|
||
|
uniform branches where these constraints do not hold.
|
||
|
|
||
|
Convergence and
|
||
|
uniformity are inter-dependent: When threads diverge at a divergent
|
||
|
branch, they may later *reconverge* at a common program point.
|
||
|
Subsequent operations are performed convergently, but the inputs may
|
||
|
be non-uniform, thus producing divergent outputs.
|
||
|
|
||
|
Uniformity is also useful by itself on targets that execute threads in
|
||
|
groups with shared execution resources (e.g. waves, warps, or
|
||
|
subgroups):
|
||
|
|
||
|
- Uniform outputs can potentially be computed or stored on shared
|
||
|
resources.
|
||
|
- These targets must "linearize" a divergent branch to ensure that
|
||
|
each side of the branch is followed by the corresponding threads in
|
||
|
the same group. But linearization is unnecessary at uniform
|
||
|
branches, since the whole group of threads follows either one side
|
||
|
of the branch or the other.
|
||
|
|
||
|
This document presents a definition of convergence that is reasonable
|
||
|
for real targets and is compatible with the currently implicit
|
||
|
semantics of convergent operations in LLVM IR. This is accompanied by
|
||
|
a *uniformity analysis* that extends the existing divergence analysis
|
||
|
[DivergenceSPMD]_ to cover irreducible control-flow.
|
||
|
|
||
|
.. [DivergenceSPMD] Julian Rosemann, Simon Moll, and Sebastian
|
||
|
Hack. 2021. An Abstract Interpretation for SPMD Divergence on
|
||
|
Reducible Control Flow Graphs. Proc. ACM Program. Lang. 5, POPL,
|
||
|
Article 31 (January 2021), 35 pages.
|
||
|
https://doi.org/10.1145/3434312
|
||
|
|
||
|
Terminology
|
||
|
===========
|
||
|
|
||
|
Cycles
|
||
|
Described in :ref:`cycle-terminology`.
|
||
|
|
||
|
Closed path
|
||
|
Described in :ref:`cycle-closed-path`.
|
||
|
|
||
|
Disjoint paths
|
||
|
Two paths in a CFG are said to be disjoint if the only nodes common
|
||
|
to both are the start node or the end node, or both.
|
||
|
|
||
|
Join node
|
||
|
A join node of a branch is a node reachable along disjoint paths
|
||
|
starting from that branch.
|
||
|
|
||
|
Diverged path
|
||
|
A diverged path is a path that starts from a divergent branch and
|
||
|
either reaches a join node of the branch or reaches the end of the
|
||
|
function without passing through any join node of the branch.
|
||
|
|
||
|
Threads and Dynamic Instances
|
||
|
=============================
|
||
|
|
||
|
Each occurrence of an instruction in the program source is called a
|
||
|
*static instance*. When a thread executes a program, each execution of
|
||
|
a static instance produces a distinct *dynamic instance* of that
|
||
|
instruction.
|
||
|
|
||
|
Each thread produces a unique sequence of dynamic instances:
|
||
|
|
||
|
- The sequence is generated along branch decisions and loop
|
||
|
traversals.
|
||
|
- Starts with a dynamic instance of a "first" instruction.
|
||
|
- Continues with dynamic instances of successive "next"
|
||
|
instructions.
|
||
|
|
||
|
Threads are independent; some targets may choose to execute them in
|
||
|
groups in order to share resources when possible.
|
||
|
|
||
|
.. figure:: convergence-natural-loop.png
|
||
|
:name: convergence-natural-loop
|
||
|
|
||
|
.. table::
|
||
|
:name: convergence-thread-example
|
||
|
:align: left
|
||
|
|
||
|
+----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|
||
|
+----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread 1 | Entry1 | H1 | B1 | L1 | H3 | | L3 | | | | Exit |
|
||
|
+----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread 2 | Entry1 | H2 | | L2 | H4 | B2 | L4 | H5 | B3 | L5 | Exit |
|
||
|
+----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
|
||
|
In the above table, each row is a different thread, listing the
|
||
|
dynamic instances produced by that thread from left to right. Each
|
||
|
thread executes the same program that starts with an ``Entry`` node
|
||
|
and ends with an ``Exit`` node, but different threads may take
|
||
|
different paths through the control flow of the program. The columns
|
||
|
are numbered merely for convenience, and empty cells have no special
|
||
|
meaning. Dynamic instances listed in the same column are converged.
|
||
|
|
||
|
.. _convergence-definition:
|
||
|
|
||
|
Convergence
|
||
|
===========
|
||
|
|
||
|
*Converged-with* is a transitive symmetric relation over dynamic
|
||
|
instances produced by *different threads* for the *same static
|
||
|
instance*. Informally, two threads that produce converged dynamic
|
||
|
instances are said to be *converged*, and they are said to execute
|
||
|
that static instance *convergently*, at that point in the execution.
|
||
|
|
||
|
*Convergence order* is a strict partial order over dynamic instances
|
||
|
that is defined as the transitive closure of:
|
||
|
|
||
|
1. If dynamic instance ``P`` is executed strictly before ``Q`` in the
|
||
|
same thread, then ``P`` is *convergence-before* ``Q``.
|
||
|
2. If dynamic instance ``P`` is executed strictly before ``Q1`` in the
|
||
|
same thread, and ``Q1`` is *converged-with* ``Q2``, then ``P`` is
|
||
|
*convergence-before* ``Q2``.
|
||
|
3. If dynamic instance ``P1`` is *converged-with* ``P2``, and ``P2``
|
||
|
is executed strictly before ``Q`` in the same thread, then ``P1``
|
||
|
is *convergence-before* ``Q``.
|
||
|
|
||
|
.. table::
|
||
|
:name: convergence-order-example
|
||
|
:align: left
|
||
|
|
||
|
+----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|
||
|
+----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread 1 | Entry | ... | | | | S2 | T | ... | Exit |
|
||
|
+----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread 2 | Entry | ... | | Q2 | R | S1 | | ... | Exit |
|
||
|
+----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread 3 | Entry | ... | P | Q1 | | | | ... | |
|
||
|
+----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
|
||
|
The above table shows partial sequences of dynamic instances from
|
||
|
different threads. Dynamic instances in the same column are assumed
|
||
|
to be converged (i.e., related to each other in the converged-with
|
||
|
relation). The resulting convergence order includes the edges ``P ->
|
||
|
Q2``, ``Q1 -> R``, ``P -> R``, ``P -> T``, etc.
|
||
|
|
||
|
The fact that *convergence-before* is a strict partial order is a
|
||
|
constraint on the *converged-with* relation. It is trivially satisfied
|
||
|
if different dynamic instances are never converged. It is also
|
||
|
trivially satisfied for all known implementations for which
|
||
|
convergence plays some role. Aside from the strict partial convergence
|
||
|
order, there are currently no additional constraints on the
|
||
|
*converged-with* relation imposed in LLVM IR.
|
||
|
|
||
|
.. _convergence-note-convergence:
|
||
|
|
||
|
.. note::
|
||
|
|
||
|
1. The ``convergent`` attribute on convergent operations does
|
||
|
constrain changes to ``converged-with``, but it is expressed in
|
||
|
terms of control flow and does not explicitly deal with thread
|
||
|
convergence.
|
||
|
|
||
|
2. The convergence-before relation is not
|
||
|
directly observable. Program transforms are in general free to
|
||
|
change the order of instructions, even though that obviously
|
||
|
changes the convergence-before relation.
|
||
|
|
||
|
3. Converged dynamic instances need not be executed at the same
|
||
|
time or even on the same resource. Converged dynamic instances
|
||
|
of a convergent operation may appear to do so but that is an
|
||
|
implementation detail. The fact that ``P`` is convergence-before
|
||
|
``Q`` does not automatically imply that ``P`` happens-before
|
||
|
``Q`` in a memory model sense.
|
||
|
|
||
|
4. **Future work:** Providing convergence-related guarantees to
|
||
|
compiler frontends enables some powerful optimization techniques
|
||
|
that can be used by programmers or by high-level program
|
||
|
transforms. Constraints on the ``converged-with`` relation may
|
||
|
be added eventually as part of the definition of LLVM
|
||
|
IR, so that guarantees can be made that frontends can rely on.
|
||
|
For a proposal on how this might work, see `D85603
|
||
|
<https://reviews.llvm.org/D85603>`_.
|
||
|
|
||
|
.. _convergence-maximal:
|
||
|
|
||
|
Maximal Convergence
|
||
|
-------------------
|
||
|
|
||
|
This section defines a constraint that may be used to
|
||
|
produce a *maximal converged-with* relation without violating the
|
||
|
strict *convergence-before* order. This maximal converged-with
|
||
|
relation is reasonable for real targets and is compatible with
|
||
|
convergent operations.
|
||
|
|
||
|
The maximal converged-with relation is defined in terms of cycle
|
||
|
headers, which are not unique to a given CFG. Each cycle hierarchy for
|
||
|
the same CFG results in a different maximal converged-with relation.
|
||
|
|
||
|
**Maximal converged-with:**
|
||
|
|
||
|
Dynamic instances ``X1`` and ``X2`` produced by different threads
|
||
|
for the same static instance ``X`` are converged in the maximal
|
||
|
converged-with relation if and only if for every cycle ``C`` with
|
||
|
header ``H`` that contains ``X``:
|
||
|
|
||
|
- every dynamic instance ``H1`` of ``H`` that precedes ``X1`` in
|
||
|
the respective thread is convergence-before ``X2``, and,
|
||
|
- every dynamic instance ``H2`` of ``H`` that precedes ``X2`` in
|
||
|
the respective thread is convergence-before ``X1``,
|
||
|
- without assuming that ``X1`` is converged with ``X2``.
|
||
|
|
||
|
.. note::
|
||
|
|
||
|
For brevity, the rest of the document restricts the term
|
||
|
*converged* to mean "related under the maximal converged-with
|
||
|
relation for the given cycle hierarchy".
|
||
|
|
||
|
Maximal convergence can now be demonstrated in the earlier example as follows:
|
||
|
|
||
|
.. table::
|
||
|
:align: left
|
||
|
|
||
|
+----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|
||
|
+----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread 1 | Entry1 | H1 | B1 | L1 | H3 | | L3 | | | | Exit |
|
||
|
+----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread 2 | Entry2 | H2 | | L2 | H4 | B2 | L4 | H5 | B3 | L5 | Exit |
|
||
|
+----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
|
||
|
- ``Entry1`` and ``Entry2`` are converged.
|
||
|
- ``H1`` and ``H2`` are converged.
|
||
|
- ``B1`` and ``B2`` are not converged due to ``H4`` which is not
|
||
|
convergence-before ``B1``.
|
||
|
- ``H3`` and ``H4`` are converged.
|
||
|
- ``H3`` is not converged with ``H5`` due to ``H4`` which is not
|
||
|
convergence-before ``H3``.
|
||
|
- ``L1`` and ``L2`` are converged.
|
||
|
- ``L3`` and ``L4`` are converged.
|
||
|
- ``L3`` is not converged with ``L5`` due to ``H5`` which is not
|
||
|
convergence-before ``L3``.
|
||
|
|
||
|
.. _convergence-cycle-headers:
|
||
|
|
||
|
Dependence on Cycles Headers
|
||
|
----------------------------
|
||
|
|
||
|
Contradictions in convergence order are possible only between two
|
||
|
nodes that are inside some cycle. The dynamic instances of such nodes
|
||
|
may be interleaved in the same thread, and this interleaving may be
|
||
|
different for different threads.
|
||
|
|
||
|
When a thread executes a node ``X`` once and then executes it again,
|
||
|
it must have followed a closed path in the CFG that includes ``X``.
|
||
|
Such a path must pass through the header of at least one cycle --- the
|
||
|
smallest cycle that includes the entire closed path. In a given
|
||
|
thread, two dynamic instances of ``X`` are either separated by the
|
||
|
execution of at least one cycle header, or ``X`` itself is a cycle
|
||
|
header.
|
||
|
|
||
|
In reducible cycles (natural loops), each execution of the header is
|
||
|
equivalent to the start of a new iteration of the cycle. But this
|
||
|
analogy breaks down in the presence of explicit constraints on the
|
||
|
converged-with relation, such as those described in :ref:`future
|
||
|
work<convergence-note-convergence>`. Instead, cycle headers should be
|
||
|
treated as implicit *points of convergence* in a maximal
|
||
|
converged-with relation.
|
||
|
|
||
|
Consider a sequence of nested cycles ``C1``, ``C2``, ..., ``Ck`` such
|
||
|
that ``C1`` is the outermost cycle and ``Ck`` is the innermost cycle,
|
||
|
with headers ``H1``, ``H2``, ..., ``Hk`` respectively. When a thread
|
||
|
enters the cycle ``Ck``, any of the following is possible:
|
||
|
|
||
|
1. The thread directly entered cycle ``Ck`` without having executed
|
||
|
any of the headers ``H1`` to ``Hk``.
|
||
|
|
||
|
2. The thread executed some or all of the nested headers one or more
|
||
|
times.
|
||
|
|
||
|
The maximal converged-with relation captures the following intuition
|
||
|
about cycles:
|
||
|
|
||
|
1. When two threads enter a top-level cycle ``C1``, they execute
|
||
|
converged dynamic instances of every node that is a :ref:`child
|
||
|
<cycle-parent-block>` of ``C1``.
|
||
|
|
||
|
2. When two threads enter a nested cycle ``Ck``, they execute
|
||
|
converged dynamic instances of every node that is a child of
|
||
|
``Ck``, until either thread exits ``Ck``, if and only if they
|
||
|
executed converged dynamic instances of the last nested header that
|
||
|
either thread encountered.
|
||
|
|
||
|
Note that when a thread exits a nested cycle ``Ck``, it must follow
|
||
|
a closed path outside ``Ck`` to reenter it. This requires executing
|
||
|
the header of some outer cycle, as described earlier.
|
||
|
|
||
|
Consider two dynamic instances ``X1`` and ``X2`` produced by threads ``T1``
|
||
|
and ``T2`` for a node ``X`` that is a child of nested cycle ``Ck``.
|
||
|
Maximal convergence relates ``X1`` and ``X2`` as follows:
|
||
|
|
||
|
1. If neither thread executed any header from ``H1`` to ``Hk``, then
|
||
|
``X1`` and ``X2`` are converged.
|
||
|
|
||
|
2. Otherwise, if there are no converged dynamic instances ``Q1`` and
|
||
|
``Q2`` of any header ``Q`` from ``H1`` to ``Hk`` (where ``Q`` is
|
||
|
possibly the same as ``X``), such that ``Q1`` precedes ``X1`` and
|
||
|
``Q2`` precedes ``X2`` in the respective threads, then ``X1`` and
|
||
|
``X2`` are not converged.
|
||
|
|
||
|
3. Otherwise, consider the pair ``Q1`` and ``Q2`` of converged dynamic
|
||
|
instances of a header ``Q`` from ``H1`` to ``Hk`` that occur most
|
||
|
recently before ``X1`` and ``X2`` in the respective threads. Then
|
||
|
``X1`` and ``X2`` are converged if and only if there is no dynamic
|
||
|
instance of any header from ``H1`` to ``Hk`` that occurs between
|
||
|
``Q1`` and ``X1`` in thread ``T1``, or between ``Q2`` and ``X2`` in
|
||
|
thread ``T2``. In other words, ``Q1`` and ``Q2`` represent the last
|
||
|
point of convergence, with no other header being executed before
|
||
|
executing ``X``.
|
||
|
|
||
|
**Example:**
|
||
|
|
||
|
.. figure:: convergence-both-diverged-nested.png
|
||
|
:name: convergence-both-diverged-nested
|
||
|
|
||
|
The above figure shows two nested irreducible cycles with headers
|
||
|
``R`` and ``S``. The nodes ``Entry`` and ``Q`` have divergent
|
||
|
branches. The table below shows the convergence between three threads
|
||
|
taking different paths through the CFG. Dynamic instances listed in
|
||
|
the same column are converged.
|
||
|
|
||
|
.. table::
|
||
|
:align: left
|
||
|
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread1 | Entry | P1 | Q1 | S1 | P3 | Q3 | R1 | S2 | Exit |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread2 | Entry | P2 | Q2 | | | | R2 | S3 | Exit |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread3 | Entry | | | | | | R3 | S4 | Exit |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
|
||
|
- ``P2`` and ``P3`` are not converged due to ``S1``
|
||
|
- ``Q2`` and ``Q3`` are not converged due to ``S1``
|
||
|
- ``S1`` and ``S3`` are not converged due to ``R2``
|
||
|
- ``S1`` and ``S4`` are not converged due to ``R3``
|
||
|
|
||
|
Informally, ``T1`` and ``T2`` execute the inner cycle a different
|
||
|
number of times, without executing the header of the outer cycle. All
|
||
|
threads converge in the outer cycle when they first execute the header
|
||
|
of the outer cycle.
|
||
|
|
||
|
.. _convergence-uniformity:
|
||
|
|
||
|
Uniformity
|
||
|
==========
|
||
|
|
||
|
1. The output of two converged dynamic instances is uniform if and
|
||
|
only if it compares equal for those two dynamic instances.
|
||
|
2. The output of a static instance ``X`` is uniform *for a given set
|
||
|
of threads* if and only if it is uniform for every pair of
|
||
|
converged dynamic instances of ``X`` produced by those threads.
|
||
|
|
||
|
A non-uniform value is said to be *divergent*.
|
||
|
|
||
|
For a set ``S`` of threads, the uniformity of each output of a static
|
||
|
instance is determined as follows:
|
||
|
|
||
|
1. The semantics of the instruction may specify the output to be
|
||
|
uniform.
|
||
|
2. Otherwise, if it is a PHI node, its output is uniform if and only
|
||
|
if for every pair of converged dynamic instances produced by all
|
||
|
threads in ``S``:
|
||
|
|
||
|
a. Both instances choose the same output from converged
|
||
|
dynamic instances, and,
|
||
|
b. That output is uniform for all threads in ``S``.
|
||
|
3. Otherwise, the output is uniform if and only if the input
|
||
|
operands are uniform for all threads in ``S``.
|
||
|
|
||
|
Divergent Cycle Exits
|
||
|
---------------------
|
||
|
|
||
|
When a divergent branch occurs inside a cycle, it is possible that a
|
||
|
diverged path continues to an exit of the cycle. This is called a
|
||
|
divergent cycle exit. If the cycle is irreducible, the diverged path
|
||
|
may re-enter and eventually reach a join within the cycle. Such a join
|
||
|
should be examined for the :ref:`diverged entry
|
||
|
<convergence-diverged-entry>` criterion.
|
||
|
|
||
|
Nodes along the diverged path that lie outside the cycle experience
|
||
|
*temporal divergence*, when two threads executing convergently inside
|
||
|
the cycle produce uniform values, but exit the cycle along the same
|
||
|
divergent path after executing the header a different number of times
|
||
|
(informally, on different iterations of the cycle). For a node ``N``
|
||
|
inside the cycle the outputs may be uniform for the two threads, but
|
||
|
any use ``U`` outside the cycle receives a value from non-converged
|
||
|
dynamic instances of ``N``. An output of ``U`` may be divergent,
|
||
|
depending on the semantics of the instruction.
|
||
|
|
||
|
Static Uniformity Analysis
|
||
|
==========================
|
||
|
|
||
|
Irreducible control flow results in different cycle hierarchies
|
||
|
depending on the choice of headers during depth-first traversal. As a
|
||
|
result, a static analysis cannot always determine the convergence of
|
||
|
nodes in irreducible cycles, and any uniformity analysis is limited to
|
||
|
those static instances whose convergence is independent of the cycle
|
||
|
hierarchy:
|
||
|
|
||
|
**m-converged static instances:**
|
||
|
|
||
|
A static instance ``X`` is *m-converged* for a given CFG if and only
|
||
|
if the maximal converged-with relation for its dynamic instances is
|
||
|
the same in every cycle hierarchy that can be constructed for that CFG.
|
||
|
|
||
|
.. note::
|
||
|
|
||
|
In other words, two dynamic instances ``X1`` and ``X2`` of an
|
||
|
m-converged static instance ``X`` are converged in some cycle
|
||
|
hierarchy if and only if they are also converged in every other
|
||
|
cycle hierarchy for the same CFG.
|
||
|
|
||
|
As noted earlier, for brevity, we restrict the term *converged* to
|
||
|
mean "related under the maximal converged-with relation for a given
|
||
|
cycle hierarchy".
|
||
|
|
||
|
|
||
|
Each node ``X`` in a given CFG is reported to be m-converged if and
|
||
|
only if:
|
||
|
|
||
|
1. ``X`` is a :ref:`top-level<cycle-toplevel-block>` node, in which
|
||
|
case, there are no cycle headers to influence the convergence of
|
||
|
``X``.
|
||
|
|
||
|
2. Otherwise, if ``X`` is inside a cycle, then every cycle that
|
||
|
contains ``X`` satisfies the following necessary conditions:
|
||
|
|
||
|
a. Every divergent branch inside the cycle satisfies the
|
||
|
:ref:`diverged entry criterion<convergence-diverged-entry>`, and,
|
||
|
b. There are no :ref:`diverged paths reaching the
|
||
|
cycle<convergence-diverged-outside>` from a divergent branch
|
||
|
outside it.
|
||
|
|
||
|
.. note::
|
||
|
|
||
|
A reducible cycle :ref:`trivially satisfies
|
||
|
<convergence-reducible-cycle>` the above conditions. In particular,
|
||
|
if the whole CFG is reducible, then all nodes in the CFG are
|
||
|
m-converged.
|
||
|
|
||
|
If a static instance is not m-converged, then every output is assumed
|
||
|
to be divergent. Otherwise, for an m-converged static instance, the
|
||
|
uniformity of each output is determined using the criteria
|
||
|
:ref:`described earlier <convergence-uniformity>`. The discovery of
|
||
|
divergent outputs may cause their uses (including branches) to also
|
||
|
become divergent. The analysis propagates this divergence until a
|
||
|
fixed point is reached.
|
||
|
|
||
|
The convergence inferred using these criteria is a safe subset of the
|
||
|
maximal converged-with relation for any cycle hierarchy. In
|
||
|
particular, it is sufficient to determine if a static instance is
|
||
|
m-converged for a given cycle hierarchy ``T``, even if that fact is
|
||
|
not detected when examining some other cycle hierarchy ``T'``.
|
||
|
|
||
|
This property allows compiler transforms to use the uniformity
|
||
|
analysis without being affected by DFS choices made in the underlying
|
||
|
cycle analysis. When two transforms use different instances of the
|
||
|
uniformity analysis for the same CFG, a "divergent value" result in
|
||
|
one analysis instance cannot contradict a "uniform value" result in
|
||
|
the other.
|
||
|
|
||
|
Generic transforms such as SimplifyCFG, CSE, and loop transforms
|
||
|
commonly change the program in ways that change the maximal
|
||
|
converged-with relations. This also means that a value that was
|
||
|
previously uniform can become divergent after such a transform.
|
||
|
Uniformity has to be recomputed after such transforms.
|
||
|
|
||
|
Divergent Branch inside a Cycle
|
||
|
-------------------------------
|
||
|
|
||
|
.. figure:: convergence-divergent-inside.png
|
||
|
:name: convergence-divergent-inside
|
||
|
|
||
|
The above figure shows a divergent branch ``Q`` inside an irreducible
|
||
|
cyclic region. When two threads diverge at ``Q``, the convergence of
|
||
|
dynamic instances within the cyclic region depends on the cycle
|
||
|
hierarchy chosen:
|
||
|
|
||
|
1. In an implementation that detects a single cycle ``C`` with header
|
||
|
``P``, convergence inside the cycle is determined by ``P``.
|
||
|
|
||
|
2. In an implementation that detects two nested cycles with headers
|
||
|
``R`` and ``S``, convergence inside those cycles is determined by
|
||
|
their respective headers.
|
||
|
|
||
|
.. _convergence-diverged-entry:
|
||
|
|
||
|
A conservative approach would be to simply report all nodes inside
|
||
|
irreducible cycles as having divergent outputs. But it is desirable to
|
||
|
recognize m-converged nodes in the CFG in order to maximize
|
||
|
uniformity. This section describes one such pattern of nodes derived
|
||
|
from *closed paths*, which are a property of the CFG and do not depend
|
||
|
on the cycle hierarchy.
|
||
|
|
||
|
**Diverged Entry Criterion:**
|
||
|
|
||
|
The dynamic instances of all the nodes in a closed path ``P`` are
|
||
|
m-converged only if for every divergent branch ``B`` and its
|
||
|
join node ``J`` that lie on ``P``, there is no entry to ``P`` which
|
||
|
lies on a diverged path from ``B`` to ``J``.
|
||
|
|
||
|
.. figure:: convergence-closed-path.png
|
||
|
:name: convergence-closed-path
|
||
|
|
||
|
Consider the closed path ``P -> Q -> R -> S`` in the above figure.
|
||
|
``P`` and ``R`` are :ref:`entries to the closed
|
||
|
path<cycle-closed-path>`. ``Q`` is a divergent branch and ``S`` is a
|
||
|
join for that branch, with diverged paths ``Q -> R -> S`` and ``Q ->
|
||
|
S``.
|
||
|
|
||
|
- If a diverged entry ``R`` exists, then in some cycle hierarchy,
|
||
|
``R`` is the header of the smallest cycle ``C`` containing the
|
||
|
closed path and a :ref:`child cycle<cycle-definition>` ``C'``
|
||
|
exists in the set ``C - R``, containing both branch ``Q`` and join
|
||
|
``S``. When threads diverge at ``Q``, one subset ``M`` continues
|
||
|
inside cycle ``C'``, while the complement ``N`` exits ``C'`` and
|
||
|
reaches ``R``. Dynamic instances of ``S`` executed by threads in set
|
||
|
``M`` are not converged with those executed in set ``N`` due to the
|
||
|
presence of ``R``. Informally, threads that diverge at ``Q``
|
||
|
reconverge in the same iteration of the outer cycle ``C``, but they
|
||
|
may have executed the inner cycle ``C'`` differently.
|
||
|
|
||
|
.. table::
|
||
|
:align: left
|
||
|
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread1 | Entry | P1 | Q1 | | | | R1 | S1 | P3 | ... | Exit |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread2 | Entry | P2 | Q2 | S2 | P4 | Q4 | R2 | S4 | | | Exit |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
|
||
|
In the table above, ``S2`` is not converged with ``S1`` due to ``R1``.
|
||
|
|
||
|
|
|
||
|
|
||
|
- If ``R`` does not exist, or if any node other than ``R`` is the
|
||
|
header of ``C``, then no such child cycle ``C'`` is detected.
|
||
|
Threads that diverge at ``Q`` execute converged dynamic instances of
|
||
|
``S`` since they do not encounter the cycle header on any path from
|
||
|
``Q`` to ``S``. Informally, threads that diverge at ``Q``
|
||
|
reconverge at ``S`` in the same iteration of ``C``.
|
||
|
|
||
|
.. table::
|
||
|
:align: left
|
||
|
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread1 | Entry | P1 | Q1 | R1 | S1 | P3 | Q3 | R3 | S3 | Exit |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread2 | Entry | P2 | Q2 | | S2 | P4 | Q4 | R2 | S4 | Exit |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
|
||
|
|
|
||
|
|
||
|
.. note::
|
||
|
|
||
|
In general, the cycle ``C`` in the above statements is not
|
||
|
expected to be the same cycle for different headers. Cycles and
|
||
|
their headers are tightly coupled; for different headers in the
|
||
|
same outermost cycle, the child cycles detected may be different.
|
||
|
The property relevant to the above examples is that for every
|
||
|
closed path, there is a cycle ``C`` that contains the path and
|
||
|
whose header is on that path.
|
||
|
|
||
|
The diverged entry criterion must be checked for every closed path
|
||
|
passing through a divergent branch ``B`` and its join ``J``. Since
|
||
|
:ref:`every closed path passes through the header of some
|
||
|
cycle<cycle-closed-path-header>`, this amounts to checking every cycle
|
||
|
``C`` that contains ``B`` and ``J``. When the header of ``C``
|
||
|
dominates the join ``J``, there can be no entry to any path from the
|
||
|
header to ``J``, which includes any diverged path from ``B`` to ``J``.
|
||
|
This is also true for any closed paths passing through the header of
|
||
|
an outer cycle that contains ``C``.
|
||
|
|
||
|
Thus, the diverged entry criterion can be conservatively simplified
|
||
|
as follows:
|
||
|
|
||
|
For a divergent branch ``B`` and its join node ``J``, the nodes in a
|
||
|
cycle ``C`` that contains both ``B`` and ``J`` are m-converged only
|
||
|
if:
|
||
|
|
||
|
- ``B`` strictly dominates ``J``, or,
|
||
|
- The header ``H`` of ``C`` strictly dominates ``J``, or,
|
||
|
- Recursively, there is cycle ``C'`` inside ``C`` that satisfies the
|
||
|
same condition.
|
||
|
|
||
|
When ``J`` is the same as ``H`` or ``B``, the trivial dominance is
|
||
|
insufficient to make any statement about entries to diverged paths.
|
||
|
|
||
|
.. _convergence-diverged-outside:
|
||
|
|
||
|
Diverged Paths reaching a Cycle
|
||
|
-------------------------------
|
||
|
|
||
|
.. figure:: convergence-divergent-outside.png
|
||
|
:name: convergence-divergent-outside
|
||
|
|
||
|
The figure shows two cycle hierarchies with a divergent branch in
|
||
|
``Entry`` instead of ``Q``. For two threads that enter the closed path
|
||
|
``P -> Q -> R -> S`` at ``P`` and ``R`` respectively, the convergence
|
||
|
of dynamic instances generated along the path depends on whether ``P``
|
||
|
or ``R`` is the header.
|
||
|
|
||
|
- Convergence when ``P`` is the header.
|
||
|
|
||
|
.. table::
|
||
|
:align: left
|
||
|
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread1 | Entry | | | | P1 | Q1 | R1 | S1 | P3 | Q3 | | S3 | Exit |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread2 | Entry | | R2 | S2 | P2 | Q2 | | S2 | P4 | Q4 | R3 | S4 | Exit |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
|
||
|
|
|
||
|
|
||
|
- Convergence when ``R`` is the header.
|
||
|
|
||
|
.. table::
|
||
|
:align: left
|
||
|
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread1 | Entry | | P1 | Q1 | R1 | S1 | P3 | Q3 | S3 | | | Exit |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
| Thread2 | Entry | | | | R2 | S2 | P2 | Q2 | S2 | P4 | ... | Exit |
|
||
|
+---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|
||
|
|
||
|
|
|
||
|
|
||
|
Thus, when diverged paths reach different entries of an irreducible
|
||
|
cycle from outside the cycle, the static analysis conservatively
|
||
|
reports every node in the cycle as not m-converged.
|
||
|
|
||
|
.. _convergence-reducible-cycle:
|
||
|
|
||
|
Reducible Cycle
|
||
|
---------------
|
||
|
|
||
|
If ``C`` is a reducible cycle with header ``H``, then in any DFS,
|
||
|
``H`` :ref:`must be the header of some cycle<cycle-reducible-headers>`
|
||
|
``C'`` that contains ``C``. Independent of the DFS, there is no entry
|
||
|
to the subgraph ``C`` other than ``H`` itself. Thus, we have the
|
||
|
following:
|
||
|
|
||
|
1. The diverged entry criterion is trivially satisfied for a divergent
|
||
|
branch and its join, where both are inside subgraph ``C``.
|
||
|
2. When diverged paths reach the subgraph ``C`` from outside, their
|
||
|
convergence is always determined by the same header ``H``.
|
||
|
|
||
|
Clearly, this can be determined only in a cycle hierarchy ``T`` where
|
||
|
``C`` is detected as a reducible cycle. No such conclusion can be made
|
||
|
in a different cycle hierarchy ``T'`` where ``C`` is part of a larger
|
||
|
cycle ``C'`` with the same header, but this does not contradict the
|
||
|
conclusion in ``T``.
|