I’m building a Windows Service application that takes as input a directory containing scanned images. My application will iterates through all images and for every image, it will perform some OCR operations in order to grab the barcode, invoice number and customer number.
Some background info:
- The tasks performed by the application are pretty CPU intensive
- There are large number of images to procss and the scanned image file are large (~2MB)
- The application runs on a 8-core server with 16GB of RAM.
My question:
Since it’s doing stuff with images on the file system I’m unsure if it will really make a difference if I change my application in a way that it will use .NET Parallel Tasks.
Can anybody give me advice about this?
Many thanks!
If processing an image takes longer than reading N images from the disk, then processing multiple images concurrently is a win. Figure you can read a 2 MB file from disk in under 100 ms (including seek time). Figure one second to read 8 images into memory.
So if your image processing takes more than a second per image, I/O isn’t a problem. Do it concurrently. You can scale that down if you need to (i.e. if processing takes 1/2 second, then you’re probably best off with only 4 concurrent images).
You should be able to test this fairly quickly: write a program that randomly reads images off the disk, and calculate the average time to open, read, and close the file. Also write a program that processes a sample of the images and compute the average processing time. Those numbers should tell you whether or not concurrent processing will be helpful.