I have 3 millions line of data each has 30 features – it is hard to include all in memory for my computer and slow to process it with learning algorithm – . I want to write a little code that makes random sampling but in JAVA and with my PC configurations it does not work or takes so much times to execute. I know that writing in C or C++ gives better solution but I am also curious about the availability of python for such case. Is it reasonable to use Python in such a case that Java is not working efficiently because of slowness and memory restriction – please do not say to increase heap size or such-?
Share
If performance is critical, this is the sort of solution I use.
prints