I work on text mining and statistical modeling, mostly course and research projects at school. I have primarily been using the Windows GUI versions of R and Python. I will be done soon and I realize that going into the industry, most work is done on Unix/Linux machines.
I wanted to get some hands on experience working on Unix before I start looking for jobs (in about 6 months), especially at the command line. I wanted to ask you guys for two things –
a. Which unix/linux distribution would be most beneficial in getting familiar with. I understand that most of the knowledge will scale across distributions, but I still wanted to know which one would be the best to invest time on.
b. Is there any resource or book to help me pick up speed on working from the command line instead of a GUI as in Gnome or KDE.
I am not sure if it matters, but I also wanted to mention that alongside I also want to invest some time in learning the basics of Hadoop, Pig and Mahout.
I use Ubuntu myself, but for your purposes, it doesn’t matter too much which one you choose – as long as the chosen one doesn’t eat up all your time learning UNIX itself – you want to focus on tools, not on system administration.
Better to spend time learning an editor (vim/emacs), a scripting language (Python, Ruby), and mapreduce (Hadoop, Pig, and Mahout).