I have a postgre server that is located in the network and I am working with the database.
I need to go over large amount of records (1mil+) and each selection takes time.
This is my current method:
DataSet ds = new psqlWork().getDataSet("SELECT * FROM z_sitemap_links");
DataTable dt = ds.Tables[0];
Parallel.ForEach(dt.AsEnumerable(), dr =>
{
new Sitemap().runSitemap(dr[1].ToString(), counter);
counter++;
});
but when the DB size will grow, this method (in my opinion) will not be as effective. Could you suggest a better way of doing this? Maybe pulling the data to process in chunks; although I don’t know how to manage this right now.
Points for optimization:
DataSetandDataTable, that will reduce some of the memory footprint.Questions to clarify your original post:
Parallel.ForEach? Provided that underlying system has the capacity for it, you will probably be just fine with the approach you have now. Consider also, that you should probably profile the actual performance instead of just guessing what’s going to happen.And, if you can utilize something like this:
row_number() OVER (ORDER BY col1) AS ithen you could skip the counter, as that would be provided for you as you select the rows coming back, but my postgres knowledge doesn’t tell me if that will be 1..100000 everytime from the above code, or if it will be what you want, but the guys over at Database Administrators would know for sure. This means your code would become: