The code (reproduced below) reads in a file, does stuff, and outputs a subset of the original file into a new file. How do I tweak it a little bit, and instead, output everything from the initial file to the output file, but adding a “flag” column, with values of “1” where the row is a row that currently goes to output (the subset of rows that we are most interested in)? The other rows (currently the ones only in the input file) would either have a blank or a “0” in the new “flag” column.
This problem occurs frequently enough for me, that it would save me many hours just to have a general way of doing this.
Would greatly appreciate any help!
import csv
inname = "aliases.csv"
outname = "output.csv"
def first_word(value):
return value.split(" ", 1)[0]
with open(inname, "r", encoding = "utf-8") as infile:
with open(outname, "w", encoding = "utf-8") as outfile:
in_csv = csv.reader(infile)
out_csv = csv.writer(outfile)
column_names = next(in_csv)
out_csv.writerow(column_names)
id_index = column_names.index("id")
name_index = column_names.index("name")
try:
row_1 = next(in_csv)
written_row = False
for row_2 in in_csv:
if first_word(row_1[name_index]) == first_word(row_2[name_index]) and row_1[id_index] != row_2[id_index]:
if not written_row:
out_csv.writerow(row_1)
out_csv.writerow(row_2)
written_row = True
else:
written_row = False
row_1 = row_2
except StopIteration:
# No data rows!
pass
I always use DictReader when writing CSVs, mainly because it is a bit more explicit (which makes things easier for me 🙂 ). Below is a highly stylized version of what you could do. Changes I made include:
csv.DictReader()andcsv.DictWriter()instead ofcsv.readerandcsv.writer. This differs by using dictionaries to represent the rows instead of lists, meaning that a row would look like{'column_name': 'value', 'column_name_2': 'value2'}. This means that every row contains the column header data and can also be treated like a dictionary.nameandnumber, and then when writing, I did a simple check to see if thenumbervalue was> 2With that in mind, here is the example: