I have a program that uses the infiniband rdmacm library rdmacm.so
On one computer (Ubuntu server), I can run it without issues.
On my dev PC (Ubuntu desktop edition), I get:
./test-client
rdmacm.so: cannot open shared object file: No such file or directory
Here’s where I get stumped.
ldd rdma-client
linux-vdso.so.1 => (0x00007fffdb62b000)
libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00007f97ca007000)
librdmacm.so.1 => /usr/lib/librdmacm.so.1 (0x00007f97c9dfe000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f97c9a3e000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f97c9821000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f97c961d000)
/lib64/ld-linux-x86-64.so.2 (0x00007f97ca237000)
cat /etc/ld.so.conf
include /etc/ld.so.conf.d/*.conf
cat /etc/ld.so.conf.d/*.conf
# Multiarch support
/lib/i386-linux-gnu
/usr/lib/i386-linux-gnu
/lib/i686-linux-gnu
/usr/lib/i686-linux-gnu
# libc default configuration
/usr/local/lib
# Multiarch support
/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu
/usr/lib/fglrx
/usr/lib32/fglrx
# Legacy biarch compatibility support
/lib32
/usr/lib32
ls -l /usr/lib/librdmacm*
-rw-r--r-- 1 root root 41146 Jul 19 2011 /usr/lib/librdmacm.a
lrwxrwxrwx 1 root root 18 Jul 19 2011 /usr/lib/librdmacm.so -> librdmacm.so.1.0.0
lrwxrwxrwx 1 root root 18 Jul 19 2011 /usr/lib/librdmacm.so.1 -> librdmacm.so.1.0.0
-rw-r--r-- 1 root root 35248 Jul 19 2011 /usr/lib/librdmacm.so.1.0.0
Everything looks correct. Why can’t I run test-client.
EDIT
The code that I’m using is from the geekinthecorner blog. Infiniband test apps.
In the client it has a couple of dlopen calls:
i.e.
void *handle = dlopen("rdmacm.so", RTLD_LAZY);
...
handle = dlopen("ibverbs.so", RTLD_LAZY);
This works on on ubuntu server. However on my development desktop machine it doesn’t find the libraries.
if I rename the libraries to like this
void *handle = dlopen("librdmacm.so", RTLD_LAZY);
...
handle = dlopen("libibverbs.so", RTLD_LAZY);
They are found. Doesn’t dlopen prepend “lib” automatically? I’m assuming yes that must be the case on my server because the libraries are found without this.
In any case, I’m not sure I need these dlopen calls. I can remove them fully and the program works. But, now I’m curious as to why dlopen is performing differently on the two machines given that the paths and /etc/ld.so.conf setup contain the same search paths.
void *dlopen(const char *filename, int flag);The
dlopen()loads the dynamic library file named by the null-terminated stringfilenameand returns an opaque “handle” for the dynamic library. If the library has dependencies on other shared libraries, then these are also automatically loaded by the dynamic linker, recursively.So, whatever be the case a standard
dlopen()will never prepend or do any modification to thefilenamestring.The specified filename is searched in the following manner:
(ELF only) If the executable file for the calling program contains a
DT_RPATHtag, and does not contain aDT_RUNPATHtag, then the directories listed in theDT_RPATHtag are searched.If, at the time that the program was started, the environment variable
LD_LIBRARY_PATHwas defined to contain a colon-separated list of directories, then these are searched. (As a security measure this variable is ignored for set-user-ID and set-group-ID programs.)(ELF only) If the executable file for the calling program contains a
DT_RUNPATHtag, then the directories listed in that tag are searched.The cache file
/etc/ld.so.cache(maintained by ldconfig(8)) is checked to see whether it contains an entry for filename.The directories
/liband/usr/libare searched (in that order).So, this strange behavior that you notice for dlopen() might possibly be due to an existing symlinks/hardlinks to the library
librdmacm.soandlibibverbs.soasrdmacm.soandibverbs.so, in any of the above mentioned library search paths.If
dlopen()fails for any reason, it returns NULL. Check for NULL before using the handle returned bydlopen.Refer:
man 3 dlopen