I have a database table having HTML content stored as binary serialized blob. I need to retrieve content one by one, look for certain keywords in the content (and report the matches found) and also save the content to the disk as HTML files. Can I parallize this using Parallel.ForEach? Is this a good Idea or there is a better one.
Thanks in advance for help,
Ashish
I would suspect that if you can pull a set of rows out of the database in one query and processed each in parallel looking for keywords, and then saving the batch back to disk in a single step, you’d see significant benefits. If you are selecting one by one and processing them in a linear fashion, you’ll see minimal benefits from doing things in parallel.
I think you’ll just have to try it both ways and measure the difference to see if it really works for you. Obviously, it will make not difference on a single core machine but an 8 core machine only processing two files may also not see any significant benefits, unless the key word search takes a long time per file, then doing them in parallel gets beneficial again. 🙂 I think your best bet is to try a couple different spikes on the various techniques and figure out what is best for you and your situation.