I am trying to parse a “pseudo-CSV” file with the python CSV reader, and am having some doubts about how to add some extra logic. The reason I call it a “pseudo-CSV” file is because some of the lines in the input file will have text (30-40 chars) before the actual CSV data starts. I am trying to figure out the best way to remove this text.
Currently, I have found 3 options for removing said text:
-
From Python, call grep and sed and pipe the output to a temp file which can then be fed to the csv reader
(Ugh, I would like to avoid this option) -
Create a CSV dialect to remove the unwanted text
(This option just feels wrong) -
Extend the File object, implementing the next() function to remove the unwanted text as necessary.
I have no control over how the input file is generated, so its not an option to modify the generation.
Here is the related code I had when I realized the problem with the input file.
with open('myFile', 'r') as csvfile:
theReader = csv.reader(csvfile)
for row in theReader:
# my logic here
If I go with option 3 above, the solution is quite straight-forward, but
then I wont be able to incorporate the with open() syntax.
So, here is my question (2 actually): Is option 3 the best way to solve this
problem? If so, how can I incorporate it with the with open() syntax?
Edit: Forgot to mention that Im using Python 2.7 on Linux.
csv.readeraccepts an arbitrary iterable besides files: