Here is the situation: I have 2 machines, A and B. A listens to a port p. B creates a socket s1 and connects to p. A accepts the connection in socket s2. For now, A and B can communicate with each other through the socket.
However, if I kill the program in A and then restart this program some time later, B doesn’t know because it hasn’t sent any data to A during this period. Now B begins to write data to A through s1. What will happen next? Why?
Actually I found the write call did not fail but A still didn’t get the data. What’s more, if I put s1 in epoll device, I found the event returned by epoll_wait is EPOLLERR | EPOLLHUP after the call to write. Why?
Unfortunately, under this situation, it seems to lose the data since the `write’ call didn’t fail but A couldn’t get the data. Any solutions?
When you kill a program having established sockets, it will send RST to all other end. so B should receive RST on s1, and all future call on s1 will return error. But some firewall may filter out the RST packet, you can check the RST packet with tcpdump.
If B doesn’t receive the RST packet in step1, when it continues sending other packets (write) to A, A will reply with RST packet, and all future call on B will return error once B receive this RST.
If B doesn’t receive the RST packet in step2 too, after a certain time (write timeout), B will drop the connection, and all future call on B will return error.
You can see, write call seldom return error, it returns success if the packet is send, doesn’t care whether the remote end receives the packet.
in your situation, you don’t get EPOLLHUP as soon as you call epoll_wait, but after received RST or write timeout