I am working on a computing cluster and I have a very weird /usr/bin/env behaviour… In two words, it works very slow.
On the head node:
$ time /usr/bin/env which
<which output>
real 0m0.025s
user 0m0.001s
sys 0m0.001s
On the computing node:
$ qsub -I
qsub: waiting for job 176620.scyld.localdomain to start
qsub: job 176620.scyld.localdomain ready
-bash-3.2$ time which
<which output>
real 0m0.003s
user 0m0.000s
sys 0m0.003s
-bash-3.2$ time /usr/bin/env /usr/bin/which
<which output>
real 0m0.003s
user 0m0.000s
sys 0m0.003s
-bash-3.2$ time /usr/bin/env which
<which output>
real 5m0.003s
user 0m0.001s
sys 0m0.001s
ps ax reports this:
12884 pts/3 S+ 0:00 /usr/bin/env which
It takes 5 min to print usage banner for which. Any ideas why this could happen?
Edit 1:
Additional info about which:
-bash-3.2$ type -a which
which is /usr/bin/which
-bash-3.2$ file /usr/bin/which
/usr/bin/which: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), stripped
-bash-3.2$ echo $PATH
/bin:/usr/bin:/home/gusev/.rvm/bin:/home/gusev/bin
Edit 2
I’ve strace‘d /usr/bin/env which and it gets stuck at
execve("/bin/which", ["which"], [/* 47 vars */]
Now running a plain
/bin/which
also stucks, but this file does not exist:
-bash-3.2$ ls /bin/which
ls: /bin/which: No such file or directory
/bin is mounted over NFS:
-bash-3.2$ mount | grep bin
10.54.0.1:/bin on /bin type nfs (nolock,nonfatal)
10.54.0.1:/usr/bin on /usr/bin type nfs (nolock,nonfatal)
So this may be a networking issue…
Edit 3:
which which works perfectly fine:
-bash-3.2$ time which which
/usr/bin/which
real 0m0.002s
user 0m0.000s
sys 0m0.002s
The output of strace -e trace=execve /usr/bin/env which is
execve("/usr/bin/env", ["/usr/bin/env", "which"], [/* 47 vars */]) = 0
execve("/bin/which", ["which"], [/* 47 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/bin/which", ["which"], [/* 47 vars */]) = 0
<which output>
Edit 4:
The hang time is exactly 5 minutes always. Looks like it is some kind of default value timeout.
In the end, I found out that I have a very long
PATHenvironment variable. And probable it somehow affected invokingexecveof NFS share.So I’ve moved a bunch of executables into a signle directory and replaces many entries in
PATHwith a single one. I do not expriencing any problems since then.