I’ve been at this a while now, and I think it in my best interest to ask advice of the experts. I know I’m not writing this the best way possible, and I’ve gone down a rabbit hole and confused myself.
I have a csv. A bunch, actually. That part is not the problem.
The lines at the top of the CSV are not really CSV data, but it does contain an important piece of info, the data for which the data is valid. For certain kinds of a report, it is on one line, and on others another.
My data starts on some line down from the top, usually 10 or 11, but I can’t always be certain. I do know that the first column always has the same info (the header of the table of data).
I want to pull the report date from the preceding lines, and for file type A, do stuffA, and for file tpye B, do stuffB, then write out that row to a new file. I’m having a problem incrementing the row and I have no idea what I’m doing wrong.
Sample data:
"Attribute ""OPSURVEYLEVEL2_O"" [Category = ""Retail v1""]"
Date exported: 2/16/13
Exported by user: William
Project:
Classification: Online Retail v1
Report type: Attributes
Date range: from 12/14/12 to 12/14/12
"Filter OpSurvey Level 2(mine): [ LEVEL:SENTENCE TYPE:KEYWORD {OPSURVEYLEVEL2_O:""gift certificate redemption"", OPSURVEYLEVEL2_O:""combine accounts"", OPSURVEYLEVEL2_O:""cancel account"", OPSURVEYLEVEL2_O:""saved project moved to purchased project"", OPSURVEYLEVEL2_O:""unlock account"", OPSURVEYLEVEL2_O:""affiliate promotions"", OPSURVEYLEVEL2_O:""print to store coupons"", OPSURVEYLEVEL2_O:""disclaimer not clear"", OPSURVEYLEVEL2_O:""prepaid issue"", OPSURVEYLEVEL2_O:""customer wants to use coupons for print to store"", OPSURVEYLEVEL2_O:""customer received someone else's order"", OPSURVEYLEVEL2_O:""hi-res images unavailable"", OPSURVEYLEVEL2_O:""how to re-order"", OPSURVEYLEVEL2_O:""missing items"", OPSURVEYLEVEL2_O:""missing envelopes: print to store"", OPSURVEYLEVEL2_O:""missing envelopes: mail order"", OPSURVEYLEVEL2_O:""group rooms"", OPSURVEYLEVEL2_O:""print to store"", OPSURVEYLEVEL2_O:""print to store coupons"", OPSURVEYLEVEL2_O:""publisher: card not available for print to store"", OPSURVEYLEVEL2_O:publisher}]"
Total: 905
OPSURVEYLEVEL2_O,Distinct Document,% of Document,Sentiment Score
PRINT TO STORE,297,32.82,-0.1
...
Sample Code
#!/usr/bin/python
import csv, os, glob, sys, errno
path = '/path/to/Downloads'
for infile in glob.glob(os.path.join(path,'report_ATTRIBUTE_OP*.csv')):
if 'OPSURVEYLEVEL2' in infile:
prime_column = 'ops2'
elif 'OPSURVEYLEVEL3' in infile:
prime_column = 'ops3'
else:
sys.exit(errno.ENOENT)
with open(infile, "r") as csvfile:
reader = csv.reader(csvfile)
report_date = 'DATE NOT FOUND'
# import pdb; pdb.set_trace()
for row in reader:
foo = 0
while foo < 1:
if row[0][0:].find('OPSURVEYLEVEL') == 0:
foo = 1
if "Date range" in row:
report_date = row[0][-8:]
break
if foo >= 1:
if row[0][0:].find('OPSURVEYLEVEL') == 0:
break
if 'ops2' in prime_column:
dup_col = row[0]
row.insert(0,dup_col)
row.append(report_date)
elif 'ops3' in prime_column:
row.append(report_date)
with open('report_merge.csv', 'a') as outfile:
outfile.write(row)
reader.next()
There are two problems that I can see in this code.
The first is that the code won’t find the date range in
row. The line:… should be:
The second is that the code:
… is breaking out of the
forloop after the header line of the data table, because that is the closest enclosing loop. I suspect that there was anotherwhilein there somewhere in a previous version of this code.The code is simpler (and bug-free) with an
ifstatement instead of thewhileandif, as follows: