I have a tab-separated data file with a little over 2 million lines and

Question

0

Asked: May 14, 20262026-05-14T01:04:44+00:00 2026-05-14T01:04:44+00:00

I have a tab-separated data file with a little over 2 million lines and

0

I have a tab-separated data file with a little over 2 million lines and 19 columns.
You can find it, in US.zip: http://download.geonames.org/export/dump/.

I started to run the following but with for l in f.readlines(). I understand that just iterating over the file is supposed to be more efficient so I’m posting that below. Still, with this small optimization, I’m using 30% of my memory on the process and have only done about 6.5% of the records. It looks like, at this pace, it will run out of memory like it did before. Also, the function I have is very slow. Is there anything obvious I can do to speed it up? Would it help to del the objects with each pass of the for loop?

def run():
    from geonames.models import POI
    f = file('data/US.txt')
    for l in f:
        li = l.split('\t')
        try:
            p = POI()
            p.geonameid = li[0]
            p.name = li[1]
            p.asciiname = li[2]
            p.alternatenames = li[3]
            p.point = "POINT(%s %s)" % (li[5], li[4])
            p.feature_class = li[6]
            p.feature_code = li[7]
            p.country_code = li[8]
            p.ccs2 = li[9]
            p.admin1_code = li[10]
            p.admin2_code = li[11]
            p.admin3_code = li[12]
            p.admin4_code = li[13]
            p.population = li[14]
            p.elevation = li[15]
            p.gtopo30 = li[16]
            p.timezone = li[17]
            p.modification_date = li[18]
            p.save()
        except IndexError:
            pass

if __name__ == "__main__":
    run()

EDIT, More details (the apparently important ones):

The memory consumption is going up as the script runs and saves more lines.
The method, .save() is an adulterated django model method with unique_slug snippet that is writing to a postgreSQL/postgis db.

SOLVED: DEBUG database logging in Django eats memory.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T01:04:45+00:00

Editorial Team

2026-05-14T01:04:45+00:00Added an answer on May 14, 2026 at 1:04 am

Make sure that Django’s DEBUG setting is set to False

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a tab-separated data file with a little over 2 million lines and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply