I am using cygwin on Windows 7. I have a directory with all text files and I want to loop through it and save the data from the second column of the first three rows for each of the file (1,2) (2,2) and (3,2).
So, the code would be something like
x1[0]=awk 'FNR == 1{print $2}'$file1
x1[1]=awk 'FNR == 2{print $2}'$file1
x1[2]=awk 'FNR == 3{print $2}'$file1
Then I want to use the divide by 100 of $x1 plus 1 to access data from other file and store it in the array. So that’s:
let x1[0]=$x1[0]/100 + 1
let x1[1]=$(x1[1]/100)+1
let x1[2]=$(x1[2]/100)+1
read1=$(awk 'FNR == '$x1[0]' {print $1}' $file2)
read2=$(awk 'FNR == '$x1[1]' {print $1}' $file2)
read3=$(awk 'FNR == '$x1[2]' {print $1}' $file2)
Do the same thing for another file, except we don’t need $x1 for this.
read4=$(awk 'FNR == 1{print $3,$4,$5,$6}' $file3)
Finally, just output all these values to a file i.e. read1-4
Need to do this in a loop for all the files in the folder, not quite sure how to go about that.The tricky part is that the filename of $file3 depends on the filename of $file1,
so if $file1 = abc123def.fna.map.txt
$file3 would be abc123def.fna
$file2 is hardcoded in it and stays the same for all the iterations.
file1 is a .txt file and a part of it looks like:
99 58900
16 59000
14 73000
file2 contains 600 lines of strings.
'Actinobacillus_pleuropneumoniae_L20'
'Actinobacillus_pleuropneumoniae_serovar_3_JL03'
'Actinobacillus_succinogenes_130Z'
‘file3’ is FASTA file and the first two lines look like this
>gi|94986445|ref|NC_008011.1| Lawsonia intracellularis PHE/MN1-00, complete genome
ATGAAGATCTTTTTATAGAGATAGTAATAAAAAAATGTCAGATAGATATACATTATAGTATAGTAGAGAA
The output can just write all the 4 reads to a random file or if possible can compare read1,read2,read3 and if it matches read4 i.e. the main name should match. In my example:
None of read1-3 match with Lawsonia intracellularis which is a part of read4. So it can just print success or failture to the file.
SAMPLE OUTPUT
Actinobacillus_pleuropneumoniae_L20
Actinobacillus_pleuropneumoniae_serovar_3_JL03
Actinobacillus_succinogenes_130Z
Lawsonia intracellularis
Failture
Sorry I was wrong about the 6 reads, just need 4 actually. Thanks for the help again.
This problem can be solved with TXR: http://www.nongnu.org/txr
Okay, I have these sample files (not your inputs, unfortunately):
As you can see, I cooked the data so there will be a match on the Lawsonia.
Run it:
Code follows. This is just a prototype; obviously it has to be developed and tested using the real data. I’ve made some guesses, like what the Lawsonia entry would look like in the index with the code attached to it.