I’m working on my 10th grade science fair project right now and I’ve kind of hit a wall. My project is testing the effect of parallelism on the efficiency of brute forcing md5 password hashes. I’ll be calculating the # of password combinations/second it tests to see how efficient it is, using 1, 4,16,32,64,128,512,and 1024 threads. I’m not sure if I’ll do dictionary brute force or pure brute force. I figure that dictionary would be easier to parallelize; just split the list up into equal parts for each thread. I haven’t written much code yet; I’m just trying to plan it out before I start coding.
My questions are:
-
Is calculating the password combinations tested/second the best way to determine the performance based on # of threads?
-
Dictionary or pure brute force? If pure brute force, how would you split up the task into a variable number of threads?
-
Any other suggestions?
I’m not trying to dampen your enthusiasm, but this is already quite a well understood problem. I’ll try to explain what to expect below. But maybe it would be better to do your project in another area. How’s about “Maximising MD5 hashing throughput” then you wouldn’t be restricted to just looking at threading.
I think that when you write up your project, you’ll need to offer some kind of analysis as to when parallel processing is appropriate and when it isn’t.
Each time that your CPU changes to another thread, it has to persist the current thread context and load the new thread context. This overhead does not occur in a single-threaded process (except for managed services like garbage collection). So all else equal, adding threads won’t improve performance because it must do the original workload plus all of the context switching.
But if you have multiple CPUs (cores) at your disposal, creating one thread per CPU will mean that you can parallelize your calculations without incurring context switching costs. If you have more threads than CPUs then context switching will become an issue.
There are 2 classes of computation: IO-bound and compute-bound. An IO-bound computation can spend large amounts of CPU cycles waiting for a response from some hardware like a network card or a hard disk. Because of this overhead, you can increase the number of threads to the point where the CPU is maxed out again, and this can cancel out the cost of context switching. However there is a limit to the number of threads, beyond which context switching will take up more time than the threads spend blocking for IO.
Compute-bound computations simply require CPU time for number crunching. This is the kind of computation used by a password cracker. Compute-bound operations do not get blocked, so adding more threads than CPUs will slow down your overall throughput.
The C# ThreadPool already takes care of all of this for you – you just add tasks, and it queues them until a Thread is available. New Threads are only created when a thread is blocked. That way, context switches are minimised.
I have a quad-core machine – breaking the problem into 4 threads, each executing on its own core, will be more or less as fast as my machine can brute force passwords.
To seriously parallelize this problem, you’re going to need a lot of CPUs. I’ve read about using the GPU of a graphics card to attack this problem.
There’s an analysis of attack vectors that I wrote up here if it’s any use to you. Rainbow tables and the processor/memory trade offs would be another interesting area to do a project in.