i am using django 1.3 and i am running a script outside of a web context (from command line).
my code keep reading 10000 entries from the db each time.
i noticed that the memory usage of the process is getting bigger over time.
my code is:
def getData(startIndex,chunkSize):
dataList =Mydata.objects.filter(update_date__isnull = True)[startIndex:startIndex+chunkSize]
return list(dataList)
if __name__ == '__main__':
chunkSize = 10000
startIndex = 0
dataSize = Mydata.objects.filter(update_date__isnull = True).count()
while startIndex < dataSize:
dataList = getData(startIndex,chunkSize)
startIndex += chunkSize
do_stuff(dataList)
my question is: do i need to use reset_queries() and or connection.close()
and is this is the reason for the increase in memory usage ?
I would start with using only or defer methods in your query. These two are used to retrieve only the fields that you actually need, instead of all fields. Your query will be slightly faster and consume less memory, because not needed fields will not be fetched from the database.