tl;dr: Can a select call on a read file descriptor (pointing at a procfs kernel module) end up blocking indefinitely, even when a timeout is specified?
I’m working in an embedded Linux system where we have several kernel modules that manage access to services. One such module monitors the state of VLANs for changes. It exports interfaces into a directory under /proc. A process can bind for notifications to this module by attempting to read from a proc_file created there. When there is a change, the module will provide data on that interface and the caller’s read will return with the information.
This appears to be causing problems because the kernel module blocks the caller’s read with a kernel semaphore (struct semaphore). I think this is causing the caller to go into the “D” or uninterruptable blocked state. The process cannot be killed properly and remains defunct if terminated. Not only is it defunct, but it is not releasing its resources.
I think this is a case of “don’t do that”, but I’m not an expert on kernel modules. It seems a better approach would be to use a spinlock or wait_event_interruptable within the module. Changing these legacy kernel modules is a big deal, so I tried to work around it by checking the FD with select before doing the read. However, that appears to block indefinitely as well.
Is there a way to rectify this without changing the mutexing used in the kernel module?
Linux allows kernel modules to do all sorts of malicious things that break correct programs. I’m afraid there’s no better answer than “don’t load buggy or malicious modules”. In your case, since you need them, you probably need to just fix them.