While i like the intellectual challenge coming from the design of multicore systems i realize that most of them were just unnecessary premature optimization.
But on the other hand usually all systems have some performance need and refactoring it later into multithreading safe operations is hard or even just economically not possible because it would be a complete rewrite with another algorithm.
What is your way to keep a balance between optimization and getting things done?
I believe threading also obeys the laws of optimization.
That is, don’t waste your time making quick operations parallel.
Instead, apply threads to tasks that take a long time to execute.
Of course, if systems start having 1000+ cores, then this answer might become obsolete and need to be revised. But then again, if you’re going to “get things done”, then you’ll definitively want to ship your product before then.