I recently decided to give matplotlib.pyplot a try, while having used gnuplot for scientific data plotting for years. I started out with simply reading a data file and plot two columns, like gnuplot would do with plot 'datafile' u 1:2.
The requirements for my comfort are:
- Skip lines beginning with a
#and skip empty lines. - Allow arbitrary numbers of spaces between and before the actual numbers
- allow arbitrary numbers of columns
- be fast
Now, the following code is my solution for the problem. However, compared to gnuplot, it really is not as fast. This is a bit odd, since I read that one big advantage of py(plot/thon) over gnuplot is it’s speed.
import numpy as np
import matplotlib.pyplot as plt
import sys
datafile = sys.argv[1]
data = []
for line in open(datafile,'r'):
if line and line[0] != '#':
cols = filter(lambda x: x!='',line.split(' '))
for index,col in enumerate(cols):
if len(data) <= index:
data.append([])
data[index].append(float(col))
plt.plot(data[0],data[1])
plt.show()
What would I do to make the data reading faster? I had a quick look at the csv module, but it didn’t seem to be very flexible with comments in files and one still needs to iterate over all lines in the file.
Since you have matplotlib installed, you must also have numpy installed. numpy.genfromtxt meets all your requirements and should be much faster than parsing the file yourself in a Python loop: