- test fails during dlpack tensor to buffer conversion
- XLA layout does not support arbitrary dlpack strides
- users should explicit materialze such tensors by making a copy
👋 hello there! I'm a fellow Googler who works on projects that leverage GitHub Actions for CI/CD. Recently I noticed a large increase in our queue time, and I've tracked it down to the [limit of 180 concurrent jobs](https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration) for an organization. To help be better citizens, I'm proposing changes across a few repositories that will reduce GitHub Actions hours and consumption. I hope these changes are reasonable and I'm happy to talk through them in more detail.
- **(you were already doing this, thank you!**) Only run GitHub Actions for pushes and PRs against the main branch of the repository. If your team uses a forking model, this change will not affect you. If your team pushes branches to the repository directly, this changes actions to only run against the primary branches or if you open a Pull Request against a primary branch.
- For long-running jobs (especially tests), I added the "Cancel previous" workflow. This is very helpful to prevent a large queue backlog when you are doing rapid development and pushing multiple commits. Without this, GitHub Actions' default behavior is to run all actions on all commits.
There are other changes you could make, depending on your project (but I'm not an expert):
- If you have tests that should only run when a subset of code changes, consider gating your workflow to particular file paths. For example, we have some jobs that do Terraform linting, but [they only run when Terraform files are changed](c4f59fee71/.github/workflows/terraform.yml (L3-L11)).
Hopefully these changes are not too controversial and also hopefully you can see how this would reduce actions consumption to be good citizens to fellow Googlers. If you have any questions, feel free to respond here or ping me on chat. Thank you!
Back in the mists of time, before omnistaging landed in JAX, we used lazy
expressions to avoid materializing large constants inside `jit` computations.
Omnistaging, which means that computations that are in the dynamic scope of a
`jit` are staged into the `jit` computation, has subsumed most of the reasons
for laziness to exist, and this PR removes the laziness support for simplicity.
At the time of this PR, laziness is used only for broadcasts and transposes in
eager mode (i.e., outside a `jit`). This allows us to:
a) fuse together multiple broadcasts and transposes, and
b) if a lazy expression is lexically captured by a `jit` computation, we can
avoid materializing it in its expanded form.
It is not clear that laziness has sufficient power to weight ratio to continue
to exist, and it is making other work on improving JAX dispatch times more
difficult. As a result, this PR removes laziness to unblock that work; if we
want laziness again we would want to reimplement it in C++ anyway.