This is a simplification of my question. I have a numpy array:
x = np.array([0,1,2,3])
and I have a function:
def f(y): return y**2
I can compute f(x).
Now suppose I really want to compute f(x) for a repeated x:
x = np.array([0,1,2,3,0,1,2,3,0,1,2,3])
Is there a way to do this without creating a repeated version of x and in a way that is transparent to f?
In my particular case, f is an involved function and one of the arguments is x. I would like to be able to calculate f when x is repeated without actually repeating it as it wont fit into memory.
Rewriting f to handle repeated x would be work and I was hoping for a clever way possibly to subclass a numpy array to do this.
Any tips appreciated.
You can (almost) do this by using a few tricks with strides.
However, there are some major caveats…
So,
yis now a view intoxwhere each row isx. No new memory is used, and we can makeyas large as we like.For example, I can do this:
…and not use any more memory than the 32 bytes required for
x. (ywould use ~8 Petabytes of ram, otherwise)However, if we reshape
yso that it only has one dimension, we’ll get a copy which will use the full amount of memory. There’s no way to describe a “horizontally” tiled view ofxusing strides and shape, so any shape with less than 2 dimensions will return a copy.Additionally, if we operate on
yin a way that would return a copy (e.g. they**2in your example), we’ll get a full copy.For that reason, it makes more sense to operate on things in-place. (e.g.
y **= 2, or equivalentlyx **= 2. Both will accomplish the same thing.)Even for a generic function, you can pass in
xand place the result back inx.E.g.
ywill be updated, as well, as it’s just a view intox.