I’m planning to do some data mining on my django app which uses appengine for storing data, however, one of my tables stores images in two of it’s columns, and because of that, it is gigabytes in size so it’s far too slow to download every time I want to analyse new data. For data mining, I only care about the plan text columns in that table, how do I exclude those columns while exporting data to an csv file?
I’m aware that there is a “column_list” for the csv connector for buildupload.yaml that you can specify to only include certain columns when exporting data, but it looks like it still downloads the entire table row before filtering out the columns when it’s converting appengine’s intermediate sqlite3 data file to csv.
For reference, I’m using the method described here to download my data http://code.google.com/appengine/docs/python/tools/uploadingdata.html, but I’m open to other solutions, preferably ones where I can automate this data export every few days.
As you’ve observed, the bulkloader downloads the entire record using remote_api, then outputs only the fields you care about to the CSV. If you want to only download selected fields, you’ll have to write your own code to do this on the server-side – possibly by using the new Files API in a mapreduce, to write a file you can then download.