When I try to open /proc/net/tcp from a child POSIX thread in C++ it fails with a “No such file or directory” error. If I try to open it from the parent thread it succeeds every time, and the process of opening/closing it in the parent thread then makes it succeed about a third of the time in the child thread too. I can open /proc/uptime in the child thread 100% of the time without issue. Here’s some example code which can be compiled with “g++ -Wall test.cc -o test -pthread”:
#include <iostream>
#include <fstream>
#include <cstring>
#include <cerrno>
#include <pthread.h>
using namespace std;
void * open_test (void *)
{
ifstream in;
in.open("/proc/net/tcp");
if (in.fail())
cout << "Failed - " << strerror(errno) << endl;
else
cout << "Succeeded" << endl;
in.close();
return 0;
}
int main (int argc, char * argv[])
{
open_test(NULL);
pthread_t thread;
pthread_create(&thread, NULL, open_test, NULL);
pthread_exit(0);
}
I am running this on an Ubuntu 12.04 box with an Intel i5-2520M (2 cores * 2 virtual cores) on Linux kernel 3.2.0. Here is the output of me running the above code 6 times in a row:
mike@ung:/tmp$ ./test
Succeeded
Failed - No such file or directory
mike@ung:/tmp$ ./test
Succeeded
Succeeded
mike@ung:/tmp$ ./test
Succeeded
Failed - No such file or directory
mike@ung:/tmp$ ./test
Succeeded
Failed - No such file or directory
mike@ung:/tmp$ ./test
Succeeded
Succeeded
mike@ung:/tmp$ ./test
Succeeded
Failed - No such file or directory
mike@ung:/tmp$
It’s probably worth noting that I don’t have this problem if I use fork instead of posix threads. If I use fork, then the child process has no problems reading /proc/net/tcp
Just a couple of data points to throw in…. It looks like this is a regression in Linux as 2.6.35 seems to work 100% of the time. 3.2.0 pukes most of the time even on my slow old Pentium M based laptop.
This behavior seems to be a kind of bug in the
/procvirtual filesystem.If you add this code just before opening the file:You’ll see that
/proc/netis a symbolic link to/proc/self/net, and/proc/sec/net/tcpis properly listed for both calls toopen_test, even when the spawned thread call fails.Edit: I just realized the above test is bogus, since the self would refer to the shell process of the system call, not this process. Using the following function instead also reveals the bug:
You’ll see that the spawned thread will sometimes not be able to see the parents’
/net/tcpfile. In fact it has disappeared, since this is the spawned shell’s process that is running thelscommand.The workaround below allows the child thread to reliably access what would be its
/proc/net/tcp.My theory is that it is some kind of race condition bug with correctly setting up theAs a test and work around, I modifed the/proc/selfentry for the thread as the proper blend of parent state and thread specific state.open_testcode to use the “process identifier” associated with the thread, rather than trying to access the parent process’s (because/proc/selfrefers to the parent process id, not the thread’s).Edit: As the evidence indicates, the bug has to do with the parent process cleaning up its
/proc/self/...state before the child thread has had a chance to read it. I still maintain this to be a bug, since the child thread is still technically part of the process. It’sgetpid()is still the same before and after the main thread callspthread_exit(). The/procentry for the parent process should remain valid until all child threads are completed. Even thoughEdit2: Jonas argues this may not be a bug. As evidence of that, there is this from
man proc:/proc/[pid]/fd ... In a multithreaded process, the contents of this directory are not available if the main thread has already terminated (typi- ally by calling pthread_exit(3)).But then consider this entry for
/proc/selfin the samemanpage entry:/proc/self This directory refers to the process accessing the /proc file system, and is identical to the /proc directory named by the process ID of the same process.If one is to believe this is not a bug because threads and processes are treated the same in Linux, then threads should have an expectation that
/proc/selfwill work. The bug may easily be fixed by modifying/proc/selfto change to use/proc/[gettid]value when the/proc/[getpid]version is no longer available, just as the workaround is doing below.