-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage in vectorised Jacobian computation #454
Comments
Thanks for this detailed explanation! Those are interesting findings with your vectorized Jacobian implementation. Indeed vectorization seems really important here. This isn't a full response to your issue, but I wanted to offer a quick note in case it's helpful: in JAX we think we cracked the "tensor-Jacobian products" problem in a more general way via the JAX doesn't have as much NumPy coverage as Autograd yet, and so it might not cover your use case, but we're working hard on it and expect it to help with use cases like this one. There are some other tricks we might be able to deploy to reduce memory usage, depending on your specific computation, including a recent trick developed by some Julia folks: https://arxiv.org/pdf/1810.08297.pdf. |
Thanks @mattjj for your comments and suggestions! JAX looks really interesting - being able to write the model functions without explicitly worrying about broadcasting / vectorisation would be ideal, as would being able to compile functions to avoid interpreter overhead. I had a go at playing around with using the I will close the issue here as it seems like going forward JAX will probably be able to resolve the issues I have, and for now my hacky work around will suffice! |
I'm interested in using Autograd to compute Jacobians. As the inbuilt
jacobian
function can be slow in some cases due to (I think) the interpreter overhead in iteratively computing a vector Jacobian product for each each output dimension, I've been using a slightly hacky vectorised alternative.Specifically I make sure all the functions which I need to compute Jacobians for broadcast over the leading dimensions of the input(s). So for example for a function
func
which as the base case takes an arrayu
as argument with shapeshape_u
and returns an arrayy = func(u)
withshape_y
then if we instead input an arrayu
with shape(n,) + shape_u
we expect the returnedy = func(u)
to have shape(n,) + shape_y
and thatall([np.allclose(y[i], func(u[i])) for i in range(n)]) == True
. For a function of this type then the belowjacobian_and_value
function, (adapted from the functions in theautograd.differential_operators
module) will compute the Jacobian by tilingsize_output
copies of the input along the leading dimension wheresize_output
is the size of the output array, then computes a single 'vector' Jacobian product using the identity matrix (or generalisation for output arrays with more than one dimension in the base case) as the 'vector'.In terms of computation time, this can give quite significant savings over the inbuilt
jacobian
function in some cases due to pushing more work in to NumPy array functions. However it has a couple of downsidesjacobian
function as all the intermediate values computed in the the forward pass for each of the replicated inputs now need to be stored for the backwards pass.I have found the increased memory usage to be particularly problematic when the function being differentiated involves a loop with lots of iterations (e.g. the timestepping loop in the numerical integration of a ODE / PDE system), as even the memory usage associated with tracing a single forward pass can be quite significant in this case.
I originally played around with trying to see if I could use the VJP function returned by
make_vjp
for evaluating the function on the original non-replicated input to do something similar, however (unsurprisingly) this didn't work as even if the primitive calls in the original forward function broadcast as required along the leading dimensions, the corresponding primitive VJP calls in the backward pass do not necessarily do so. I also tried adding an extra leading dimension of length one to the input passed tomake_vjp
however this didn't help (again not surprising). The main culprits for operations in the function causing issues when trying to broadcast in the backwards pass I've encountered so far are slicing arrays (and correspondinguntake
calls in the backwards pass) and calls tonumpy.reshape
.It seems what I want to do is exactly what would be achieved by the tensor Jacobian products proposed in PR #280 by @mattjj however I am not sure if that is something actively being worked on anymore or not?
Otherwise I was wondering if anyone has any other ideas of possible ways for achieve something like this without the memory overhead?
The text was updated successfully, but these errors were encountered: