I have a 3000×300 matrix file (float). when I read and convert to float, I am getting float64, which is default in python. I tried numpy and map() to convert it to float32() but they both seem very inefficient.
my code:
x = open(readFrom, 'r').readlines()
y = [[float(i) for i in s.split()] for s in x]
time taken: 0:00:00.996000
numpy implementation:
x = open(readFrom, 'r').readlines()
y = [[np.float32(i) for i in s.split()] for s in x]
time taken: 0:00:06.093000
map()
x = open(readFrom, 'r').readlines()
y = [map(np.float32, s.split()) for s in x]
time taken: 0:00:05.474000
How can I convert to float32 very efficiently?
Thank you.
Update:
numpy.loadtxt() or numpy.genfromtxt() not working (giving memory error) for huge file. I have posted a question related to that and the approach I presented here works well for huge matrix file (50,000×5000). here is the question
If memory is a problem, and if you know the size of the field ahead of time, you probably don’t want to read the entire file in the first place. Something like this is probably more appropriate:
from a couple quick (and surprising) tests on my machine, it appears that the
mapmay not even be necessary:This might not be the fastest, but certainly it’ll be the most memory efficient way to do it.
Some tests: