In my project I need to upload a big file (~250GB) to remote server, and then run a script to load the file into mysql.
The problem is, if I load the single file it will take too long time. So I have to split the file into small trunks and run 10-20 processes simultaneously in multiple terminals. If I split each file ~2MB, it will take me 100,000 times operation.Then I have to run like
ruby importer.rb data_part01_aa.csv
ruby importer.rb data_part01_ab.csv
ruby importer.rb data_part01_ac.csv
.
.
.
in each terminal, wait for them to end, and run the next.
Is there any method that can automate this process? Any shell scripts that can continue doing the job when the previous one is finished?
Thanks a lot!
In shell you can try:
The previous one can be written as one-line as follow:
Eventually, it can take some time to start running if the arguments are too many. In such case, you can try with
find:However, the previous command will search recursively in every sub-directory. To make it run for the current directory only, you will have to run:
In every example given, the commands will be run sequentially. Instead of
*.csvyou can play with different patterns (i.e.a*.csv,b*.csv,[ab]*.*csv, etc.), or you can try another loop:Where
echo {a..q}generates a sequence of letter from a to q, which seems to follow the names of your files. The key in the last example is the &, which leaves the process in background, in the last example, there will 17 process running simultaneously. If you do not want them simultaneously, then you just need to remove the ampersand &.