I am trying to install numpy with OpenBLAS , however I am at loss as to how the site.cfg file needs to be written.
When the installation procedure was followed the installation completed without errors, however there is performance degradation on increasing the number of threads used by OpenBLAS from 1 (controlled by the environment variable OMP_NUM_THREADS).
I am not sure if the OpenBLAS integration has been perfect. Could any one provide a site.cfg file to achieve the same.
P.S.: OpenBLAS integration in other toolkits like Theano, which is based on Python, provides substantial performance boost on increasing the number of threads, on the same machine.
I just compiled
numpyinside avirtualenvwithOpenBLASintegration, and it seems to be working OK.This was my process:
Compile
OpenBLAS:If you don’t have admin rights you could set
PREFIX=to a directory where you have write privileges (just modify the corresponding steps below accordingly).Make sure that the directory containing
libopenblas.sois in your shared library search path.To do this locally, you could edit your
~/.bashrcfile to contain the lineThe
LD_LIBRARY_PATHenvironment variable will be updated when you start a new terminal session (use$ source ~/.bashrcto force an update within the same session).Another option that will work for multiple users is to create a
.conffile in/etc/ld.so.conf.d/containing the line/opt/OpenBLAS/lib, e.g.:Once you are done with either option, run
Grab the
numpysource code:Copy
site.cfg.exampletosite.cfgand edit the copy:Uncomment these lines:
Check configuration, build, install (optionally inside a
virtualenv)The output should look something like this:
Installing with
pipis preferable to usingpython setup.py install, sincepipwill keep track of the package metadata and allow you to easily uninstall or upgrade numpy in the future.Optional: you can use this script to test performance for different thread counts.
There seems to be a noticeable improvement in performance for higher thread counts. However, I haven’t tested this very systematically, and it’s likely that for smaller matrices the additional overhead would outweigh the performance benefit from a higher thread count.