So I realize this question sounds stupid (and yes I am using a dual core), but I have tried two different libraries (Grand Central Dispatch and OpenMP), and when using clock() to time the code with and without the lines that make it parallel, the speed is the same. (for the record they were both using their own form of parallel for). They report being run on different threads, but perhaps they are running on the same core? Is there any way to check? (Both libraries are for C, I’m uncomfortable at lower layers.) This is super weird. Any ideas?
Share
EDIT: Added detail for Grand Central Dispatch in response to OP comment.
While the other answers here are useful in general, the specific answer to your question is that you shouldn’t be using
clock()to compare the timing.clock()measures CPU time which is added up across the threads. When you split a job between cores, it uses at least as much CPU time (usually a bit more due to threading overhead). Search for clock() on this page, to find “If process is multi-threaded, cpu time consumed by all individual threads of process are added.”It’s just that the job is split between threads, so the overall time you have to wait is less. You should be using the wall time (the time on a wall clock). OpenMP provides a routine
omp_get_wtime()to do it. Take the following routine as an example:The results are:
You can see that the
clock()time doesn’t change much. I get 0.254 without thepragma, so it’s a little slower using openMP with one thread than not using openMP at all, but the wall time decreases with each thread.The improvement won’t always be this good due to, for example, parts of your calculation that aren’t parallel (see Amdahl’s_law) or different threads fighting over the same memory.
EDIT: For Grand Central Dispatch, the GCD reference states, that GCD uses
gettimeofdayfor wall time. So, I create a new Cocoa App, and inapplicationDidFinishLaunchingI put:and I get the following results on the console:
which is about the same as I was getting above.
This is a very contrived example. In fact, you need to be sure to keep the optimization at -O0, or else the compiler will realize we don’t keep any of the calculations and not do the loop at all. Also, the integer that I’m taking the
cosof is different in the two examples, but that doesn’t affect the results too much. See theSTRIDEon the manpage fordispatch_applyfor how to do it properly and for whyiterationsis broadly comparable tonum_threadsin this case.EDIT: I note that Jacob’s answer includes
which is not correct (it has been partly fixed by an edit). Using
omp_get_thread_num()is indeed a good way to ensure that your code is multithreaded, but it doesn’t show “which core it’s working on”, just which thread. For example, the following code:prints out that it’s using threads 0 to 49, but this doesn’t show which core it’s working on, since I only have eight cores. By looking at the Activity Monitor (the OP mentioned GCD, so must be on a Mac – go
Window/CPU Usage), you can see jobs switching between cores, so core != thread.