Numpy newbie here. I’m trying to normalize (aka feature scaling, standardization) my inputs to a neural network. I just doing linear scaling and the formula I’m using is:
I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)
where I is the scaled input value, Imin and Imax are the desired min and max range of the scaled values, D is the original data value, and Dmin and Dmax are the min and max range of the original data values. I want a python method that takes a numpy array and returns an array with all the values normalized. This is what I’m thinking so far.
def get_normalized_values(array):
"""I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)"""
imin = -1
imax = 1
dmin = array.amin()
dmax = array.amax()
normalized = imin + (imax - imin)*(array - dmin)/(dmax - dmin)
return normalized
My question is will this work? Or do I have to loop through each element in the array and perform the math? Can you just do math like this with arrays and scalars? That is, will array - dmin create a new temporary array where each value has dmin subtracted? Not sure if this is the right terminology but I think this is a “vectorized” approach?
Update
Is there a way to have this modify the array in place? That is rather than returning a copy of the array, have the function take the array and modify the original array?
I believe you need to change the calls
amin()andamax()to just be calls tomin()andmax(), as inmy_array.max().Otherwise, this should work fine. You can do things in NumPy much like Octave/Matlab, such as adding a scalar to an array, and it automatically knows to map the operation to all elements. Sometimes, you might need slightly different syntax (like knowing the difference between
numpy.linalg.dot()and just multiplying two arrays), but in general things like this are as straightforward as you have indicated.