I have a huge file, which has some missing rows. The data needs to be rooted at Country.
The input data is like:
csv_str = """Type,Country,State,County,City,
1,USA,,,
2,USA,OH,,
3,USA,OH,Franklin,
4,USA,OH,Franklin,Columbus
4,USA,OH,Franklin,Springfield
4,USA,WI,Dane,Madison
"""
which needed to be:
csv_str = """Type,Country,State,County,City,
1,USA,,,
2,USA,OH,,
3,USA,OH,Franklin,
4,USA,OH,Franklin,Columbus
4,USA,OH,Franklin,Springfield
4,USA,WI,,
4,USA,WI,Dane,
4,USA,WI,Dane,Madison
"""
The key as per my logic is Type field, where if I cannot find a County (type 3) for a City (type 4), then insert a row with fields upto County.
Same with County. If I cannot find a State (type 2) for a County (type 3), then insert a row with fields upto State.
With my lack of understanding the facilities in python, I was trying more of a brute-force approach. It is bit problematic as I need lot of iteration over the same file.
I was also tried google-refine, but couldn’t get it work. Doing manually is quite error prone.
Any help appreciated.
import csv
import io
csv_str = """Type,Country,State,County,City,
1,USA,,,
2,USA,OH,,
3,USA,OH,Franklin,
4,USA,OH,Franklin,Columbus
4,USA,OH,Franklin,Springfield
4,USA,WI,Dane,Madison
"""
found_county =[]
missing_county =[]
def check_missing_county(row):
found = False
for elm in found_county:
if elm.Type == row.Type:
found = True
if not found:
missing_county.append(row)
print(row)
reader = csv.reader(io.StringIO(csv_str))
for row in reader:
check_missing_county(row)
I’ve knocked up the following based on my understanding of the question: