While browsing the code of an erlang application, I came across an interesting design problem. Let me describe the situation, but I can’t post any code because of PIA sorry.
The code is structured as an OTP application in which two gen_server modules are responsible for allocating some kind of resources. The application runs perfectly for some time and we didn’t really had big issues.
The tricky part begins when one the first gen_server need to check if the second have enough resources left. A call is issued to the second gen_server that itself call a utility library that (in very very special case) issue a call to the first gen_server.
I’m relatively new to erlang but I think that this situation is going to make the two gen_server wait for each other.
This is probably a design problem but I just wanted to know if there is any special mechanism built into OTP that can prevent this kind of “hangs”.
Any help would be appreciated.
EDIT :
To summaries the answers : If you have a situation where two gen_servers call each other in a cyclic way you’d better spend some more time in the application design.
Thanks for your help 🙂
This is called a deadlock and could/should be avoided at a design level. Below is a possible workaround and some subjective points that hopefully helps you avoid doing a mistake.
While there are ways to work around your problem, “waiting” is exactly what the
callis doing.One possible work around would be to spawn a process from inside A which calls B, but does not block A from handling the call from B. This process would reply directly to the caller.
In server A:
In server B:
For me this is very complex and superhard to reason about. I think you even could call it spaghetti code without offending anyone.
On another note, while the above might solve your problem, you should think hard about what calling like this actually implies. For example, what happens if server A executes this call many times? What happens if at any point there is a timeout? How do you configure the timeouts so they make sense? (The innermost call must have a shorter timeout than the outer calls, etc).
I would change the design, even if it is painful, because when you allow this to exist and work around it, your system becomes very hard to reason about. IMHO, complexity is the root of all evil and should be avoided at all costs.