I’m trying to extract data from a large CSV file in the following format, assume ‘x’ is data in the form of text or an integer. Each grouping has an unique id, but not always has the same number of lines per grouping or color. The data is separated from the color by a comma.
id, x
red, x
green, x
blue, x
black, x
id, x
yellow, x
green,
blue, x
black, x
id, x
red, x
green, x
blue, x
black, x
id, x
red, x
green, x
blue, x
id, x
red, x
green, x
blue, x
black, x
I would like to re-arrange the data in in a column format. The ID should be the first column and any data separated by a comma. My goal is to get it to read the first word in the line and place it in the appropriate column.
line 0 - ID - red - green - blue - yellow - black
line 1 - x, x, x, , x,
line 2 - , x, x, x, x,
line 3 - x, x, x, , x,
line 4 - x, x, x, , ,
line 5 - x, x, x, , x,
This is what I was trying…
readfile = open("db-short.txt", "r")
datafilelines = readfile.readlines()
writefile = open("sample.csv", "w")
temp_data_list = ["",]*7
td_index = 0
for line_with_return in datafilelines:
line = line_with_return.replace('\n','')
if not line == '':
if not (line.startswith("ID") or
line.startswith("RED") or
line.startswith("GREEN") or
line.startswith("BLUE") or
line.startswith("YELLOW") or
line.startswith("BLACK") ):
temp_data_list[td_index] = line
td_index += 1
temp_data_list[6] = line
if (line.startswith("BLACK") or line.startswith("BLACK")):
temp_data_list[5] = line
if (line.startswith("YELLOW") or line.startswith("YELLOW")):
temp_data_list[4] = line
if (line.startswith("BLUE") or line.startswith("BLUE")):
temp_data_list[3] = line
if (line.startswith("GREEN") or line.startswith("GREEN")):
temp_data_list[2] = line
if (line.startswith("RED") or line.startswith("RED")):
temp_data_list[1] = line
if (line.startswith("ID") or line.find("ID") > 0):
temp_data_list[0] = line
if line == '':
temp_data_str = ""
for temp_data in temp_data_list:
temp_data_str += temp_data + ","
temp_data_str = temp_data_str[0:-1] + "\n"
writefile.write(temp_data_str)
temp_data_list = ["",]*7
td_index = 0
if temp_data_list[0]:
temp_data_str = ""
for temp_data in temp_data_list:
temp_data_str += temp_data + ","
temp_data_str = temp_data_str[0:-1] + "\n"
writefile.write(temp_data_str)
readfile.close()
writefile.close()
This assumes Python < 2.7 (and therefore doesn’t take advantage of opening multiple files with one
with, writing the headers with the built-inwriteheaders, etc. Note that in order to get it to work properly, I removed the spaces from between the commas in your CSV. As mentioned by @JamesHenstridge, it would definitely be worth reading up on thecsvmodule so that this makes a bit more sense.Using your data above (with some random comma-separated numbers thrown in), this writes: