I have a distributed system that basically executes processes (not OS processes, just stuff that needs to be done). after a few unsuccesful tries (timeouts) it notifies a failure.
I want to continue trying to execute the process afterwards in the background and the question is: should i use a bigger timeout period? or an increasingly bigger timeout (getting bigger and bigger each try)
- There are many reasons for a process to fail, mainly network problems.
It depends on the reason for the failure to do something on the first attempt.
If it is due to potential overload / temporary exhaustion of some resource, you might want to try some exponential back off strategy. The reason being, that continuous attempts to acquire that what you want could make things even worse and thus will probably never lead to success.
If you are basically waiting for something to happen or be available e.g. a port being open or a file being there (“polling” basically), you might just want to wait for fixed periods of time.
This is somewhat oversimplified, but may give some basic ideas. Just make sure that you thoroughly test whatever strategy (or combination thereof) you choose, to make sure that it (obviously) actually works and also does not worsen anything.