mirror of https://github.com/ROCm/jax.git synced 2025-04-25 00:06:06 +00:00

History

George Necula c368c69625 [shape_poly] Extend the handling of jnp.arange with shape polymorphism.

Previously, only `arange(stop, dtype=...)` was being handled in presence
of shape polymorphism. Here we extend to add support for `start` and `step`
to be also present. There are still plenty of restrictions:

   * no floating point constants are allowed among start, stop and step
   * we must resolve statically if step is positive or negative
   * we must resolve statically if the distance between start and stop
     is negative or positive.

2023-03-31 14:41:26 +02:00

examples

fix typo: "one of more" -> "one or more"

2023-03-22 22:06:50 -07:00

g3doc

Fix typo "compileable"->"compilable".

2023-03-17 21:51:01 -07:00

tests

[shape_poly] Extend the handling of jnp.arange with shape polymorphism.

2023-03-31 14:41:26 +02:00

__init__.py

Use jax2tf with static type checking

2023-03-05 23:11:38 +00:00

BUILD

Merge pull request #15303 from jakevdp:lax-asarray

2023-03-30 20:11:11 +00:00

call_tf.py

[1/n] store embedded tf.graph to stablehlo.custom_call

2023-03-24 11:27:24 -07:00

impl_no_xla.py

Add support for XLAGather with 2-D Batch Dimensions for enable_xla=False

2023-02-23 13:43:25 -08:00

jax2tf.py

[shape_poly] Improved computation of dimension variables for native serialization

2023-03-30 15:51:24 +02:00

JAX2TF_getting_started.ipynb

Rename master to main in embedded links.

2021-06-18 10:00:01 +03:00

jax_export.py

[shape_poly] Improved computation of dimension variables for native serialization

2023-03-30 15:51:24 +02:00

OWNERS

Internal change

2021-10-21 09:30:16 -07:00

README.md

[shape_poly] Fixed bug with dimension variables in unused args

2023-03-27 13:37:39 +02:00

shape_poly.py

[shape_poly] Improved computation of dimension variables for native serialization

2023-03-30 15:51:24 +02:00

README.md

JAX and TensorFlow interoperation (jax2tf/call_tf)

This package provides support for JAX native serialization and for interoperation between JAX and TensorFlow. There are two interoperation directions:

jax2tf.convert: for calling JAX functions in a TensorFlow context, e.g., for eager or graph TensorFlow execution, or for serializing as a TensorFlow SavedModel; and
jax2tf.call_tf: for calling TensorFlow functions in a JAX context, e.g., to call a TensorFlow library or to reload a TensorFlow SavedModel and call its functions in JAX.

These APIs can be combined, e.g., to reload in JAX a program that has been serialized from JAX to a TensorFlow SavedModel, or to save to TensorFlow SavedModel a JAX program that uses a TensorFlow library.

Tip: As of version 0.4.7 (March 2023), there is a new option native_serialization to use JAX's native lowering to StableHLO to obtain one StableHLO module for the entire JAX function instead of lowering each JAX primitive to a TensorFlow op.

The preferred mode of JAX-TensorFlow interoperation is by way of native serialization in which the target function is lowered to StableHLO using standard native JAX or TensorFlow APIs, and then the StableHLO module is invoked from the other framework. To enable this mode, set native_serialization=True (soon to be the default). This has several advantages:

supports virtually all operations supported by native execution, e.g., xmap, shard_map, pmap, parallel collective operations, and all primitives at all data types.
uses standard native code paths in each framework, and thus it is easier to trust that the semantics and performance stays faithful to the native semantics, across platforms. Has optional checking that the code runs on the platform for which it was serialized.
the metadata associated with the operations, e.g., source location, is identical to what native execution uses.

At the moment when using JAX native serialization the whole JAX compilation unit is wrapped with a single thin TensorFlow op, called XlaCallModule, that carries the serialized version of the StableHLO obtained from JAX. This op is supported only on TensorFlow platforms that include the XLA compiler, and it compiles and then invokes the embedded StableHLO. The reasons we wrap the StableHLO in a TensorFlow op are:

it allows saving the serialization in a tf.SavedModel, for use with multiple mature tools for TensorFlow,
it allows composing the JAX program with TensorFlow pre-processing, post-processing, and host callback functions,
the XlaCallModule contains the code that must be executed to deserialize, compile, and execute the JAX program, e.g., to handle properly backward compatibility and to do the just-in-time preprocessing needed for shape polymorphism.
the semantics of JAX program is still preserved faithfully because it is entirely captured by the StableHLO serialization.

For backwards compatibility purposes, and for special uses, the JAX-TensorFlow interoperation APIs can be used also in a graph serialization mode (the only mode available before version 0.4.7), without going through StableHLO.

For calling JAX functions from TensorFlow, it is possible to request that the JAX function be lowered with one TensorFlow op for each JAX primitive. This can be achieved by setting native_serialization=False. This enables the following:
- TensorFlow eager mode execution, e.g., for debugging,
- producing a tf.Graph for consumption by tooling that understands TensorFlow ops but does not yet work with StableHLO, e.g., TFLite and TensorFlow.js.
- using the more mature support for dynamic shapes in TensorFlow. StableHLO does have support for dynamic shapes, and in the near future we expect it will support shape polymorphism to the same extent as graph serialization.
Even in the graph serialization mode the resulting TensorFlow graph is pretty much 1:1 with the StableHLO module that would be obtained through native serialization.
For calling TensorFlow functions from JAX, if the resulting JAX program is executed in op-by-op mode (i.e., not under jax.jit or jax.pmap and not inside lax.cond or lax.scan) then the target TensorFlow function is executed in eager mode. This can be useful if the target TensorFlow function is not lowerable to HLO, e.g., is using strings.

We describe below some general concepts and capabilities, first for jax2tf.convert and later for jax2tf.call_tf. For more involved examples, please see examples involving:

SavedModel for archival (examples below), including saving batch-polymorphic functions,
TensorFlow Lite (examples),
TensorFlow.js (examples),
TFX (examples),
TensorFlow Hub and Keras (examples).

[TOC]

Usage: basic functions.

As a rule of thumb, if you can jax.jit your function then you should be able to use jax2tf.convert:

from jax.experimental import jax2tf
from jax import numpy as jnp

import numpy as np
import tensorflow as tf

def f_jax(x):
  return jnp.sin(jnp.cos(x))

# jax2tf.convert is a higher-order function that returns a wrapped function with
# the same signature as your input function but accepting TensorFlow tensors (or
# variables) as input.
f_tf = jax2tf.convert(f_jax)

# For example you execute f_tf eagerly with valid TensorFlow inputs:
f_tf(np.random.random(...))

# Additionally you can use tools like `tf.function` to improve the execution
# time of your function, or to stage it out to a SavedModel:
f_tf_graph = tf.function(f_tf, autograph=False)

Note that when using the default native serialization, the target JAX function must be jittable (see JAX - The Sharp Bits). In the native serialization mode, under TensorFlow eager the whole JAX function executes as one op.

The Autograph feature of tf.function cannot be expected to work on functions lowered from JAX as above, so it is recommended to set autograph=False in order to speed up the execution and to avoid warnings and outright errors.

Usage: saved model

You can serialize JAX program into a TensorFlow SavedModel, for use with tooling that understands SavedModel. Both in native and non-native serialization you can count on 6 months of backwards compatiblity (you can load a function serialized today with tooling that will be built up to 6 months in the future), and 3 weeks of limited forwards compatibility (you can load a function serialized today with tooling that was built up to 3 weeks in the past, provided the model that not use any new features).

Since jax2tf provides a regular TensorFlow function using it with SavedModel is trivial:

# You can save the model just like you would with any other TensorFlow function:
my_model = tf.Module()
# Save a function that can take scalar inputs.
my_model.f = tf.function(jax2tf.convert(f_jax), autograph=False,
                         input_signature=[tf.TensorSpec([], tf.float32)])
tf.saved_model.save(my_model, '/some/directory',
                    options=tf.saved_model.SaveOptions(experimental_custom_gradients=True))

# Restoring (note: the restored model does *not* require JAX to run, just XLA).
restored_model = tf.saved_model.load('/some/directory')

An important point is that in the above code snippet everything after the jax2tf invocation is standard TensorFlow code. In particular, the saving of the model is not directly part of the jax2tf API, and the user has full control over how to create the SavedModel.

For example, just like for regular TensorFlow functions, it is possible to include in the SavedModel multiple versions of a function for different input shapes, by "warming up" the function on different input shapes:

my_model.f = tf.function(jax2tf.convert(f_jax), autograph=False)
my_model.f(tf.ones([1, 28, 28]))  # a batch size of 1
my_model.f(tf.ones([16, 28, 28]))  # a batch size of 16
tf.saved_model.save(my_model, '/some/directory',
                    options=tf.saved_model.SaveOptions(experimental_custom_gradients=True))

Saved model with parameters

Some special care is needed to ensure that the model parameters are not embedded as constants in the graph and are instead saved separately as variables. This is useful for two reasons: the parameters could be very large and exceed the 2GB limits of the GraphDef part of the SavedModel, or you may want to fine-tune the model and change the value of the parameters.

For example, consider the following function:

def model_jax(inputs):
  return param0 + param1 * inputs

If you just lower and save the model directly, the values of param0 and param1 will be embedded in the computation graph. In fact, the value of param1 is needed for the gradient computation and will be embedded twice: once in the computation graph for the forward computation and once for the backward computation, unless you turn off the staging of gradients or their saving as discussed further below (e.g., with_gradient=False). Note also that if one views the above function as an ML model parameterized by param0 and param1 then the gradient function will be w.r.t. the inputs, while you probably want gradients w.r.t. the parameters.

A better way to deal with parameters (or any large constants) is to pass them as parameters to the function to be lowered:

def model_jax(params, inputs):
  return params[0] + params[1] * inputs

# Wrap the parameter constants as tf.Variables; this will signal to the model
# saving code to save those constants as variables, separate from the
# computation graph.
params_vars = tf.nest.map_structure(tf.Variable, params)

# Build the prediction function by closing over the `params_vars`. If you
# instead were to close over `params` your SavedModel would have no variables
# and the parameters will be included in the function graph.
prediction_tf = lambda inputs: jax2tf.convert(model_jax)(params_vars, inputs)

my_model = tf.Module()
# Tell the model saver what are the variables.
my_model._variables = tf.nest.flatten(params_vars)
my_model.f = tf.function(prediction_tf, jit_compile=True, autograph=False)
tf.saved_model.save(my_model)

This strategy will avoid any copies of the large parameters in the computation graph (they will be saved in a variables area of the model, which is not subject to the 2GB limitation).

For examples of how to save a Flax model as a SavedModel see the examples directory.

Saved model and differentiation

The code lowered from JAX supports differentiation from TensorFlow. In order to ensure that the result of TensorFlow differentiation is identical to the one that JAX differentiation would produce, we will annotate the lowered primal function with a tf.custom_gradient that, upon TensorFlow differentiation, will lazily call into JAX to compute the jax.vjp of the lowered primal function, followed by jax2tf lowering of the gradient function. This ensures that ultimately it is JAX that performs the differentiation, thus respecting any custom gradients that may be present in the original function.

The jax2tf.convert function has an option with_gradient=False to skip the custom gradients and wrap instead the lowered function with tf.raw_ops.PreventGradient to generate an error in case a gradient computation is attempted.

SavedModels enables saving custom derivative rules by using the experimental_custom_gradients option:

options = tf.saved_model.SaveOptions(experimental_custom_gradients=True)
tf.saved_model.save(model, path, options=options)

If you use with_gradient=True and forget to use the experimental_custom_gradients=True parameter to tf.saved_model.save when you later load the saved model you will see a warning:

WARNING:absl:Importing a function (__inference_converted_fun_25) with ops with unsaved custom gradients. Will likely fail if a gradient is requested.

and if you do attempt to take a gradient of the loaded model you may get an error:

TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
For example, the following function will fail:
  @tf.function
  def has_init_scope():
    my_constant = tf.constant(1.)
    with tf.init_scope():
      added = my_constant * 2
The graph tensor has name: args_0:0

(We are working with the TF team to give a more explicit error in this case.)

Saved model for non-differentiable JAX functions

Note that if the JAX function is not reverse-mode differentiable, e.g., uses lax.while_loop then attempting to save its conversion to a SavedModel will fail with:

ValueError: Error when tracing gradients for SavedModel

You have two options, either pass with_gradient=False to jax2tf.convert, or set tf.saved_model.SaveOption(experimental_custom_gradients=False). In either case, you will not be able to compute the gradients of the function loaded from the SavedModel.

Support for partitioning

jax2tf supports JAX functions that use jax.pjit and jax.jit with sharded arguments and results, for single-host meshes. The lowering is actually similar as for a jax.jit, except that the arguments and results will be wrapped with tensorflow.python.compiler.xla.experimental.xla_sharding.XlaSharding TensorFlow ops.

In the default native serialization mode, if the target JAX function includes sharding operations, e.g., from nested jax.pjit, then there should be a top-level jax.pjit. E.g.,

# The following is correct
with mesh:
   jax2tf.convert(pjit.pjit(f_jax, in_shardings=...))(...)

# The following will lead to errors because pjit is not at top-level.
def wrapped_pjit(x):
   ...pjit.pjit(f_jax, in_shardings=...))...

with mesh:
  jax2tf.convert(wrapped_pjit)

A limitation of XlaSharding is that it cannot be used in TensorFlow eager mode. Therefore, jax2tf will give an error when lowering a function that requires sharded (not replicated) arguments or results and the lowered function is used outside a tf.function context (see b/255511660).

Another limitation is that today only TPUs have integrated with XLA SPMD support in serving, while CPUs and GPUs don't have e2e XLA SPMD support yet in TensorFlow. Executing a jax2tf converted tf.function with XlaSharding ops on CPUs and GPUs will simply ignore all the XlaSharding ops.

Note that when saving a model, the parameters to the model are wrapped with tf.Variable before calling the lowered function (see above), therefore outside of the XlaSharding wrapper.

Shape-polymorphic conversion

The shape polymorphism support is work in progress. Please report any bugs you encounter.

We described above how to include in the SavedModel several specializations of a lowered function for a few specific input shapes. jax2tf can also produce a shape-polymorphic TensorFlow graph that is usable with inputs of any shape matching certain constraints. This is useful, e.g., to allow a single SavedModel to be used for multiple batch sizes.

The standard TensorFlow technique for producing a shape-polymorphic graph is to warm the tf.function on partially-specified (shape-polymorphic) inputs, e.g., tf.TensorSpec([None, 28, 28], tf.float32) for a function that processes a batch (of unspecified batch size) of 28x28 images. For jax2tf it is additionally necessary to specify an additional polymorphic_shapes parameter for the jax2tf.convert function:

f_tf = tf.function(jax2tf.convert(f_jax,
                                  polymorphic_shapes=["(b, 28, 28)"]),
                                  autograph=False)
f_tf.get_concrete_function(tf.TensorSpec([None, 28, 28], tf.float32))

The polymorphic_shapes parameter, in the form of a pytree of strings corresponding to the pytree of positional arguments, introduces one or more dimension variables, e.g., b, to stand for shape dimensions that are assumed to be unknown at JAX tracing time. Dimension variables are assumed to range over all integers that are greater or equal to 1. In this particular example, we can also abbreviate polymorphic_shapes=["(b, _, _)"], because the _ placeholders take their value from the corresponding dimension of the tf.TensorSpec (which must be known). As a further shortcut for a series of _ at the end of a shape specification you can use ...: polymorphic_shapes=["(b, ...)"].

In the example above, the polymorphic_shapes specification does not convey more information than the partial tf.TensorSpec, except that it gives a name to the unknown dimension, which improves error messages. The real need for named shape variables arises when there are multiple unknown dimensions and there is a relationship between them. For example, if the function to be lowered is also polymorphic on the size of each image while requiring the images to be square, we would add a dimension variable d to stand for the unknown image size:

f_tf = tf.function(jax2tf.convert(f_jax, polymorphic_shapes=["(b, d, d)"]), autograph=False)
f_tf.get_concrete_function(tf.TensorSpec([None, None, None], tf.float32))

The JAX tracing mechanism performs shape checking using the same strict rules as when the shapes are fully known. For example, given the "(b, d, d)" specification for the argument x of a function, JAX will know that a conditional x.shape[-2] == x.shape[-1] is True, and will also know that x and jnp.sin(x) have the same shape of a batch of square matrices that can be passed to jnp.matmul.

Correctness of shape-polymorphic tracing

We want to trust that the lowered program produces the same results as the original JAX program. More precisely:

For any function f_jax and any input signature abs_sig containing partially known tf.TensorSpec, and any concrete input x whose shape matches abs_sig:

If the conversion to TensorFlow succeeds: f_tf = tf.function(jax2tf.convert(f_jax, polymorphic_shapes)).get_concrete_function(abs_sig)
and if the TensorFlow execution succeeds with result y: f_tf(x) = y
then the JAX execution would produce the same result: f_jax(x) = y,

It is crucial to understand that f_jax(x) has the freedom to re-invoke the JAX tracing machinery, and in fact it does so for each distinct concrete input shape, while the generation of f_tf uses JAX tracing only once, and invoking f_tf(x) does not use JAX tracing anymore. In fact, the latter invocation may happen after the f_tf has been serialized to a SavedModel and reloaded in an environment where f_jax and the JAX tracing machinery are not available anymore.

Coverage of shape-polymorphic tracing

Besides correctness, a secondary goal is to be able to lower many shape-polymorphic programs, but at the very least batch-size-polymorphic programs, so that one SavedModel can be used for any batch sizes. For example, we want to ensure that any function written using jax.vmap at the top level can be lowered with the batch dimension polymorphic and the remaining dimensions concrete.

It is reasonable to expect that there will be JAX programs for which there is a shape-polymorphic TensorFlow graph, but which will give an error when lowering with jax2tf. In general, you should expect that shape polymorphism can handle those programs for which all the intermediate shapes can be expressed as simple expressions in the dimension variables appearing in the input shapes. In particular, this does not apply to programs whose intermediate shapes depend on the data.

Details

In order to be able to use shape polymorphism effectively with jax2tf, it is worth considering what happens under the hood. When the lowered function is invoked with a TensorSpec, jax2tf will use the polymorphic_shapes parameter to obtain a shape abstraction for the inputs. The dimension sizes from the TensorSpec are used to fill in the _ and ... placeholders from polymorphic_shapes. Normally, the shape abstraction contains the dimension sizes, but in the presence of shape polymorphism, some dimensions may be dimension variables.

The polymorphic_shapes parameter must be either None, or a pytree of shape specifiers corresponding to the pytree of arguments. (A value None for polymorphic_shapes is equivalent to a list of None. See how optional parameters are matched to arguments.) A shape specifier is combined with a TensorSpec as follows:

A shape specifier of None means that the shape is given by the actual argument TensorSpec, which must be fully known.
Otherwise, the specifier must be a comma-separated string of dimension specifiers: (dim_1, ..., dim_n), denoting an n-dimensional array. The TensorSpec must also be of rank n. An ... at the end of the shape specifier is expanded to a list of _ or appropriate length. The corresponding dimensions from the shape specifier and the TensorSpec are matched:
- the dimension specifier of _ means that the size of the dimension is given by the actual TensorSpec, which must have a known size in the corresponding dimension.
- a dimension specifier can also be a lowercase identifier, denoting a dimension-size variable ranging over strictly positive integers. The abstract value of the dimension is going to be set to this variable. The corresponding dimension in TensorSpec can be None or can be a constant.
- All occurrences of a dimension variable in any dimension for any argument are assumed to be equal.

Note that polymorphic_shapes controls the shape abstraction used by JAX when tracing the function. The TensorSpec gives the shape abstraction that TensorFlow will associate with the produced graph, and can be more specific.

A few examples of shape specifications and uses:

polymorphic_shapes=["(b, _, _)", None] can be used for a function with two arguments, the first having a batch leading dimension that should be polymorphic. The other dimensions for the first argument and the shape of the second argument are specialized based on the actual TensorSpec, which must be known. The lowered function can be used, e.g., with TensorSpecs [None, 28, 28] and [28, 16] for the first and second argument respectively. An alternative TensorSpec pair can be [1, 28, 28] and [28, 16], in which case the JAX tracing is done for the same polymorphic shape given by polymorphic_shapes=["(b, 28, 28)", "(28, 16)"].
polymorphic_shapes=["(batch, _)", "(batch,)"]: the leading dimensions of the two arguments must match, and are assumed to be greater than 1. The second dimension of the first argument is taken from the actual TensorSpec. This can be used with a TensorSpec pair [None, 16] and [None]. It can also be used with a pair of shapes [8, 16] and [8].

Computing with dimension variables

JAX keeps track of the shape of all intermediate results. When those shapes depend on dimension variables JAX computes them as symbolic expressions involving dimension variables. The symbolic expressions can represent the result of applying arithmetic operators (add, sub, mul, floordiv, mod, including the NumPy variants np.sum, np.prod, etc.) on dimension variables and integers (int, np.int, or anything convertible by operator.index). These symbolic dimensions can then be used in shape-parameters of JAX primitives and APIs, e.g., in jnp.reshape, jnp.arange, slicing indices, etc.

For example, in the following code to flatten a 2D array, the computation x.shape[0] * x.shape[1] computes the symbolic dimension 4 * b as the new shape:

jax2tf.convert(lambda x: jnp.reshape(x, (x.shape[0] * x.shape[1],)),
                polymorphic_shapes=["(b, 4)"])(np.ones((3, 4)))

When a symbolic dimension is used in arithmetic operations with non-integers, e.g., float, np.float, np.ndarray, or JAX arrays, it is automatically converted to a JAX array using jnp.array. For example, in the function below all occurrences of x.shape[0] are converted implicitly to jnp.array(x.shape[0]) because they are involved in operations with non-integer scalars or with JAX arrays:

jax2tf.convert(lambda x: (x + x.shape[0] + jnp.sin(x.shape[0]),
                          5. + x.shape[0],
                          x.shape[0] - np.ones((5,), dtype=np.int32)),
               polymorphic_shapes=["b"])(np.ones(3))

Another typical example is when computing averages:

jax2tf.convert(lambda x: jnp.sum(x, axis=0) / x.shape[0],
               polymorphic_shapes=["(v, _)"])(np.ones((3, 4)))

It is also possible to convert dimension polynomials explicitly to JAX arrays, with jnp.array(x.shape[0]) or even jnp.array(x.shape). The result of these operations cannot be used anymore as dimension parameters and will raise a JAX error.

Errors in presence of shape polymorphism

If you write your program assuming that all shapes are tuples of integers, and then try to trace it with shape polymorphism you can run into a number of errors.

The program:

four_ones = np.ones((4,))
jax2tf.convert(lambda x, y: x + y,
               polymorphic_shapes=["(v,)", "(4,)"])(four_ones, four_ones)

with result in the error 'add got incompatible shapes for broadcasting: (v,), (4,)' because the shape abstraction that JAX tracing uses is given by the polymorphic_shapes, even though the actual arguments are more specific and would actually work.

Also,

jax2tf.convert(lambda x: jnp.matmul(x, x),
               polymorphic_shapes=["(v, 4)"])(np.ones((4, 4)))

will result in the error dot_general requires contracting dimensions to have the same shape, got [4] and [v]. What is happening here is that in the process of type checking the matmul operation, JAX will want to ensure the size of the two axes is the same (v == 4). Note that v can stand for any integer greater than 0, so the value of the equality expression can be true or false. Since it is not always true that v == 4, the shape checking rules fail with the above error. Since the lowered function works only for square matrices, the correct polymorphic_shapes is ["(v, v)"].

As explained above, if the dimension polynomials are used in operations with non-integers, the result will be a JAX array that cannot be used as a shape parameter. For example, if we modify the reshape example slightly, to use np.array([x.shape[1]]) instead of x.shape[1]:

jax2tf.convert(lambda x: jnp.reshape(x, (x.shape[0] * np.array([x.shape[1]]),)),
                polymorphic_shapes=["(b, 4)"])(np.ones((3, 4)))

we get an error Shapes must be 1D sequences of concrete values of integer type, got Traced<...>. If you get this error on JAX code that works for static shapes, it means that one operation that computes shape parameters is using non-integer arguments, e.g., np.ndarray, that get implicitly converted to JAX arrays. The solution is to avoid np.array, float, or JAX arrays in operations whose results are used as shapes, e.g., instead of np.arange(n) * x.shape[0] write [i * x.shape[0] for i in range(n)].

Comparison of symbolic dimensions is partially supported

Inside JAX there are a number of equality and inequality comparisons involving shapes, e.g., for doing shape checking or even for choosing the implementation for some primitives. Comparisons are supported as follows:

equality is supported with a caveat: if the two symbolic dimensions denote the same value under all valuations for dimension variables, then equality evaluates to True, e.g., for b + b == 2*b; otherwise the equality evaluates to False. See below for a discussion of important consequences of this behavior.
disequality is always the negation of equality.
inequality is partially supported, in a similar way as partial equality. However, in this case we take into consideration that dimension variables range over strictly positive integers. E.g., b >= 1, b >= 0, 2 * a + b >= 3 are True, while b >= 2, a >= b, a - b >= 0 are inconclusive and result in an exception.

For example, the following code raises the exception core.InconclusiveDimensionOperation with the message Dimension polynomial comparison 'a + 1' >= 'b' is inconclusive.

jax2tf.convert(lambda x: 0 if x.shape[0] + 1 >= x.shape[1] else 1,
                polymorphic_shapes=["(a, b)"])(np.ones((3, 4)))

The equality comparison returns False for b + 1 == b or b == 0 (in which case it is certain that the dimensions are different for all valuations), but also for b == 1 and for a == b. This is unsound, and we ought to raise core.InconclusiveDimensionOperation because under some valuations the result should be True and under other valuations it should be False. We choose to make equality total thus allowing unsoundness because otherwise we may get spurious errors in presence of hash collisions when hashing dimension expressions or objects that include them (shapes, core.AbstractValue, core.Jaxpr). Besides the hashing errors, a partial semantics of equality leads to errors for the following expressions b == a or b == b or b in [a, b] even though the error is avoided if we change the order of the comparisons.

We attempted to retain soundness and hashability by creating both hashable and unhashable kinds of symbolic dimensions PR #14200, but it turned out to be very hard to diagnose hashing failures in user programs because often hashing is implicit when using sets or memo tables.

Code of the form if x.shape[0] != 1: raise NiceErrorMessage is sound even with this treatment of equality, but code of the form if x.shape[0] != 1: return 1 is unsound.

Division of symbolic dimensions is partially supported

JAX will attempt to simplify division and modulo operations, e.g., (a * b + a) // (b + 1) == a and 6*a + 4 % 3 == 1. In particular, JAX will handle the cases when either (a) there is no remainder, or (b) the divisor is a constant in which case there may be a constant remainder. For example, the code below results in a division error when trying to compute the inferred dimension for a reshape operation:

jax2tf.convert(lambda x: jnp.reshape(x, (2, -1)),
               polymorphic_shapes=["(b, ...)"])(np.ones((4, 5, 7)))

In this case you will see the error Cannot divide evenly the sizes of shapes (b, 5, 7) and (2, -1), with a further Details: Cannot divide '35*b' by '-2'. The polynomial 35*b represents the total size of the input tensor.

Note that the following will succeed:

## The resulting symbolic shape is (2, 15 b).
jax2tf.convert(lambda x: jnp.reshape(x, (2, -1)),
               polymorphic_shapes=["(b, ...)"])(np.ones((4, 5, 6)))

## The resulting symbolic shape is (6 b2, b1).
jax2tf.convert(lambda x: jnp.reshape(x, (-1, x.shape[0])),
               polymorphic_shapes=["(b1, b2, ...)"])(np.ones((4, 5, 6)))

You may also encounter division errors when working with strides, such as when computing the padding in a strided convolution.

When JAX cannot simplify the result of symbolic dimension division it will construct symbolic expressions of the form floordiv(E, N) and mod(E, N) and it will use a number of heuristics to evaluate comparisons involving these. If you encounter InconclusiveDimensionOperation exceptions you can specify that a dimension variable is a multiple of the divisor, e.g., b in the above example of dividing 35*b by -2 may be known to be a multiple of 2. You can specify that by replacing b with 2*b in the polymorphic shape specification:

jax2tf.convert(lambda x: jnp.reshape(x, (2, -1)),
               polymorphic_shapes=["(2*b, ...)"])(np.ones((4, 5, 7)))

Dimension variables must be solvable from the input shapes

jax2tf will generate code to derive the values of the dimension variables from the input shapes. This works only if the symbolic dimensions in the input shapes are linear. For example, the following polymorphic_shapes will result in errors:

polymorphic_shapes = ["a * a"]  # Not a linear polynomial
polymorphic_shapes = ["a + b"]  # Too few equations to derive both `a` and `b`

If you are using native serialization, the restrictions are stronger: every dimension variable must occur as the value of some dimension of some input, e.g., the following will work:

polymorphic_shapes = ["a, 2*a, b"]
polymorphic_shapes = ["a * a, a"]

Known issues

jax2tf has been in use since 2020 and the vast majority of users encounter no problems. However, there are a few rare corner cases in which the different conventions of JAX and TensorFlow result in a breakage. We try to give an exhaustive list below, specifying whether the limitations apply to the native serialization or non-native.

Different 64-bit precision in JAX and TensorFlow

Applies to both native and non-native serialization.

JAX behaves somewhat differently than TensorFlow in the handling of 32-bit vs. 64-bit values. However, the jax2tf lowered function always behaves like the JAX function.

JAX interprets the type of Python scalars differently based on JAX_ENABLE_X64 flag. (See JAX - The Sharp Bits: Double (64bit) precision.) In the default configuration, the flag is unset, and JAX interprets Python constants as 32-bit, e.g., the type of 3.14 is float32. This is also what TensorFlow always does. JAX goes further, it forces all explicitly-specified 64-bit values to be interpreted as 32-bit:

# with JAX_ENABLE_X64=0
jnp.sin(3.14)  # Has type float32
tf.math.sin(3.14)  # Has type float32

jnp.sin(np.float64(3.14))  # Also has type float32
tf.math.sin(np.float64(3.14))  # Has type float64

# The jax2tf.convert function behaves like the JAX function.
jax2tf.convert(jnp.sin)(3.14)  # Has type float32
jax2tf.convert(jnp.sin)(np.float64(3.14))  # Has type float32

# The following will still compute `sin` in float32 (with a tf.cast on the argument).
tf.function(jax2tf.convert(jnp.sin), autograph=False)(tf.Variable(3.14, dtype=tf.float64))

When the JAX_ENABLE_X64 flas is set, JAX uses 64-bit types for Python scalars and respects the explicit 64-bit types:

# with JAX_ENABLE_X64=1
jnp.sin(3.14)  # Has type float64
tf.math.sin(3.14)  # Has type float32

# The jax2tf.convert function behaves like the JAX function.
jax2tf.convert(jnp.sin)(3.14)  # Has type float64

# The following will compute `sin` in float64.
tf.function(jax2tf.convert(jnp.sin), autograph=False)(tf.Variable(3.14, dtype=tf.float64))

# The following will compute `sin` in float32.
tf.function(jax2tf.convert(jnp.sin), autograph=False)(tf.Variable(3.14))

This is achieved by inserting tf.cast operations on the input arguments inside the lowered function, if necessary.

If you want to create a tf.Variable or tf.TensorSpec with the same dtype, you should use jax2tf.dtype_of_val:

# The following two calls will lower jax_fun at the same dtypes
# independently of the value of JAX_ENABLE_X64.
jax2tf.convert(jax_fun)(3.14)
jax2tf.convert(jax_fun)(tf.Variable(3.14, dtype=jax2tf.dtype_of_val(3.14)))

Functions whose arguments and results are nested Python data structures

Applies to both native and non-native serialization.

jax2tf can lower functions with arguments and results that are nested collections (tuples, lists, dictionaries) of numeric values or JAX arrays (pytrees). The resulting TensorFlow function will take the same kind of arguments except the leaves can be numeric values or TensorFlow tensors (tf.Tensor, tf.TensorSpec, tf.Variable).

As long as the arguments use only standard Python containers (tuple, list, dictionaries), both JAX and TensorFlow can flatten and unflatten them and you can use the lowered function in TensorFlow without limitations.

However, if your JAX function takes a custom container, you can register it with the JAX tree_util module so that JAX will know how to operate with it, and you can still lower the function to use it in TensorFlow eager and with tf.function, but you won't be able to save it to a SavedModel, nor will you be able to compute gradients with TensorFlow (code from jax2tf_test.test_custom_pytree_readme):

class CustomPair:
  def __init__(self, a, b):
    self.a = a
    self.b = b

# Register it with the JAX tree_util module
jax.tree_util.register_pytree_node(CustomPair,
                                   lambda x: ((x.a, x.b), None),
                                   lambda _, ab: CustomPair(*ab))
def f_jax(pair: CustomPair):
  return 2. * pair.a + 3. * pair.b

x = CustomPair(4., 5.)
res_jax = f_jax(x)
# TF execution works as long as JAX can flatten the arguments
res_tf = jax2tf.convert(f_jax)(x)
self.assertAllClose(res_jax, res_tf.numpy())
res_tf_2 = tf.function(jax2tf.convert(f_jax), autograph=False, jit_compile=True)(x)

If you want to save the function in a SavedModel or compute gradients, you should construct a wrapper:

 # wrapped TF function to use only standard containers
def f_tf_wrapped(a, b):
  return f_tf(CustomPair(a, b))

# Try to put into SavedModel
my_model = tf.Module()
# Save a function that can take scalar inputs.
my_model.f = tf.function(f_tf_wrapped, autograph=False,
                         input_signature=[tf.TensorSpec([], tf.float32),
                                          tf.TensorSpec([], tf.float32)])
model_dir = os.path.join(absltest.get_default_test_tmpdir(), str(id(my_model)))
tf.saved_model.save(my_model, model_dir,
                    options=tf.saved_model.SaveOptions(experimental_custom_gradients=True))

# Restoring (note: the restored model does *not* require JAX to run, just XLA).
restored_model = tf.saved_model.load(model_dir)
def restored_f(pair: CustomPair):
  return restored_model.f(pair.a, pair.b)

res_tf_3 = restored_f(x)
self.assertAllClose(res_jax, res_tf_3)
grad_jax = jax.grad(f_jax)(x)

x_v = [tf.Variable(x.a), tf.Variable(x.b)]
with tf.GradientTape() as tape:
  res = f_tf_wrapped(*x_v)
  grad_tf = tape.gradient(res, x_v)

self.assertAllClose(grad_jax.a, grad_tf[0])
self.assertAllClose(grad_jax.b, grad_tf[1])

Lowering gradients for functions with integer arguments or unused arguments

Applies to both native and non-native serialization.

When JAX differentiates functions with integer or boolean arguments, the gradients will be zero-vectors with a special float0 type (see PR 4039](https://github.com/google/jax/pull/4039)). This type is translated to int32 when lowering to TF. For example,

x = np.int16(2)
def f_jax(x):  # x: int16
  return x * 2.

jax.grad(f_jax, allow_int=True)(x)
# returns a special `float0`: array((b'',), dtype=[('float0', 'V')])

jax2tf.convert(jax.grad(f_jax, allow_int=True))(x)
# returns a tf.Tensor(0, shape=(), dtype=int32)

Note that this is different from how TensorFlow handles gradients for integer or boolean arguments: sometimes the gradient is None, sometimes it is a zero with the same dtype as the argument, and sometimes it is a one with the same dtype as the argument (e.g., for the identity function).

def f_tf(x):  # x: int16
  return tf.cast(x, tf.float32) * 2.

xv = tf.Variable(x)
with tf.GradientTape(persistent=True) as tape:
  print(tape.gradient(f_tf(xv), xv))
  # returns None
  print(tape.gradient(f_tf(xv), xv,
                      unconnected_gradients=tf.UnconnectedGradients.ZERO))
  # returns 0 with the same shape and dtype as x

When differentiating functions with unused arguments, TF by default returns the value None for the corresponding gradients. The tape.gradient function takes the option tf.UnconnectedGradients.ZERO to ask that gradients for unused arguments be zero.

Functions lowered with jax2tf.convert behave the same way under tf.UnconnectedGradients.ZERO, but by default, they will return None only for gradients corresponding to integer arguments.

# x1 and x3 are not used. x3 has integer type.
def fn(x0, x1, x2, x3):
  return x0 * 0. + x2 * 2.

xs = [tf.Variable(x) for x in [10., 11., 12., 13]]
with tf.GradientTape(persistent=True) as tape:
 res = fn(*xs)

g_tf_native = tape.gradient(res, xs)
# Returns: 0., None, 2., None

g_tf_native_0 = tape.gradient(res, xs,
                              unconnected_gradients=tf.UnconnectedGradients.ZERO)
# Returns: 0., 0., 2., 0

# Now with jax2tf.convert
with tf.GradientTape() as tape:
  res = jax2tf.convert(fn, with_gradient=True)(*xs)

g_jax2tf = tape.gradient(res, xs)
# Returns: 0., 0., 2., None
# Note that the gradient for x1 is 0.

g_jax2tf_0 = tape.gradient(res, xs,
                            unconnected_gradients=tf.UnconnectedGradients.ZERO)
# Returns: 0., 0., 2., 0
# In this case we get the same result as for TF native.

Errors due to tf.Module magic conversion during attribute assignment

Applies to both native and non-native serialization.

tf.Module will automatically wrap the standard Python container data types into trackable classes during attribute assignment. Python Dict/List/Tuple are changed to _DictWrapper/_ListWrapper/_TupleWrapper classes. In most situation, these Wrapper classes work exactly as the standard Python data types. However, the low-level pytree data structures are different and this can lead to errors.

In such cases, the user can use this workaround:

import tensorflow as tf
input_data = #Any data object

m = tf.Module()
flat, tree_def = jax.tree_util.tree_flatten(input_data)
m.input_data = {"flat": flat, "tree_def": tree_def}

Later the user can use tree_unflatten for the reverse process:

input_data = jax.tree_util.tree_unflatten(m.input_data['tree_def'], m.input_data['flat'])

Large saved_model.pb due too many PRNG operations

Applies to both native and non-native serialization.

The default threefry2x32 PRNG is implemented in JAX with dozens of additions and bitwise operations. This means that a single PRNG operation in JAX will result in dozens of TF ops after jax2tf. If the number of RPNG operations is large, the generated TF graph will be very large.

To reduce the TF graph size and the compilation time one can use the unsafe_rbg PRNG implementation by setting jax.config.update('jax_default_prng_impl', 'unsafe_rbg'). The unsafe_rbg implementation will be lowered to a TF op and several casts and reshapes, thus significantly reducing the number of TF ops per PRNG operation. The "unsafe" part is that it doesn't guarantee determinism across JAX/XLA versions, and the quality of random streams it generates from different keys is less well understood. Nevertheless, this should be fine for most inference/serving cases. See more details in the JAX PRNG documentation.

SavedModel supports only first-order gradients

Applies to both native and non-native serialization.

The jax2tf-lowered function supports higher-order gradients, but when the function is saved in a SavedModel, only the first-order gradient is saved. This is primarily a limitation of the SavedModel support for custom gradients.

Native serialization supports only select custom calls

Applies to native serialization only.

JAX natively uses custom calls for lowering of certain primitives. The most common example are for the implementation of PRNG on GPUs where we get better performance with a custom call (cu_threefry32) than if we use native StableHLO. Another class of examples are for FFT and some linear algebra primitives (e.g., QR decomposition).

Unlike regular StableHLO ops, the compatibility guarantees for custom calls are the burden of the teams maintaining the C++ code that backs the custom call. For this reason, we maintain a list of allowed custom call targets. If you try to serialize code that invokes other targets you will get an error.

If you do not care about the compatibility guarantees of the serialized artifact, you can set native_serialization_strict_checks to False to disable the check.

XlaCallModule not supported by some TensorFlow tools

Applies to native serialization only.

JAX native serialization uses the XlaCallModule TensorFlow op to host the StableHLO program obtained from JAX. This is a relatively new TensorFlow op and may not be supported by some tools. In fact, certain tools that need to do tf.Graph inspection and transformation cannot work when the whole JAX program is a single TensorFlow op.

This is the case, for example, for the TFLite and TensorFlow.js converters. There is work underway to enable more tools to consume StableHLO.

Natively serialized JAX modules are platform specific

Applies to native serialization only.

When you use native serialization, JAX will record the plaform for which the module was serialized, and you will get an error if you try to execute the XlaCallModule TensorFlow op on another platform.

Note that this error will only arise in native serialization; with non-native serialization the lowering to TensorFlow ops is platform independent, although it is only guaranteed to match the JAX semantics and performance behavior for TPUs.

The error has the form:

The current platform CPU is not among the platforms required by the module [CUDA]

where CPU is the TensorFlow platform where the op is being executed and CUDA is the plaform for which the module was serialized by JAX. This probably means that JAX and TensorFlow may see different devices as the default device (JAX defaults to GPU and TensorFlow to CPU in the example error above). You can check what devices TensorFlow uses:

logging.info("All TF devices: %s", tf.config.list_logical_devices())
tf_device = (tf.config.list_logical_devices("TPU") + 
             tf.config.list_logical_devices("GPU") + 
             tf.config.list_logical_devices())[0]
assert jax.default_backend().upper() == tf_device.device_type 
with tf.device(tf_device):
   ...