I am quite confused by dtype when creating numpy array. I am creating them from a list of floats. First let me note that is not an issue of printing, becuase I already did: np.set_printoptions(precision=18).
This is a part of my list:
In [37]: boundary
Out[37]:
[['3366307.654296875', '5814192.595703125'],
['3366372.2244873046875', '5814350.752685546875'],
['3366593.37969970703125', '5814844.73492431640625'],
['3367585.4779052734375', '5814429.293701171875'],
['3367680.55389404296875', '5814346.618896484375'],
....
[ 3366307.654296875 , 5814192.595703125 ]]
Then I convert it to a numpy array:
In [43]: boundary2=np.asarray(boundary, dtype=float)
In [44]: boundary2
Out[44]:
array([[ 3366307.654296875 , 5814192.595703125 ],
[ 3366372.2244873046875 , 5814350.752685546875 ],
[ 3366593.37969970703125, 5814844.73492431640625],
....
[ 3366307.654296875 , 5814192.595703125 ]])
# the full number of significant digits is preserved.
# this also works with:
In [45]: boundary2=np.array(boundary, dtype=float)
In [46]: boundary2
Out[46]:
array([[ 3366307.654296875 , 5814192.595703125 ],
[ 3366372.2244873046875 , 5814350.752685546875 ],
[ 3366593.37969970703125, 5814844.73492431640625],
...
[ 3366307.654296875 , 5814192.595703125 ]])
# This also works with dtype=np.float
In [56]: boundary3=np.array(boundary, dtype=np.float)
In [57]: boundary3
Out[57]:
array([[ 3366307.654296875 , 5814192.595703125 ],
[ 3366372.2244873046875 , 5814350.752685546875 ],
[ 3366593.37969970703125, 5814844.73492431640625],
....
[ 3366307.654296875 , 5814192.595703125 ]])
Here is why I am confused, if I used dtype=np.float32 it seems like I loosing significant digits:
In [58]: boundary4=np.array(boundary, dtype=np.float32)
In [59]: boundary4
Out[59]:
array([[ 3366307.75, 5814192.5 ],
[ 3366372.25, 5814351. ],
[ 3366593.5 , 5814844.5 ],
[ 3367585.5 , 5814429.5 ],
...
[ 3366307.75, 5814192.5 ]], dtype=float32)
The reason I say it seems is because apparently the arrays are the same. I can’t see the data directly, but checking with np.allclose returns True:
In [65]: np.allclose(boundary2, boundary4)
Out[65]: True
So, if you read so far, I hope you see why I am confused, and maybe there someone who can answer the following 2 questions:
- Why is
dtype=float32“hiding” my data ? - Should I be concerned about it or I can safely continue using
dtype=float?
All floating point types have limited precision. The number of significant digits they can store depends on the number of bits in the floating point type. If you provide
float,numpy.floatornumpy.float64asdtype, 64 bits are used (“double precision”), resulting in about 16 significant decimal digits. Fornumpy.float32, 32 bits are used (“single precision”), resulting in about 8 significant decimal digits. So nothing is “hidden”, you simply see the effects of limited floating point precision.numpy.allclose()returnsTruebecause all values are close within the limits of the floating point type you chose.