I am using cygwin on Windows 7 . I have a directory with all

Question

0

Asked: May 31, 20262026-05-31T16:49:16+00:00 2026-05-31T16:49:16+00:00

I am using cygwin on Windows 7 . I have a directory with all

0

I am using cygwin on Windows 7. I have a directory with all text files and I want to loop through it and save the data from the second column of the first three rows for each of the file (1,2) (2,2) and (3,2).

So, the code would be something like

  x1[0]=awk 'FNR == 1{print $2}'$file1

  x1[1]=awk 'FNR == 2{print $2}'$file1

  x1[2]=awk 'FNR == 3{print $2}'$file1

Then I want to use the divide by 100 of $x1 plus 1 to access data from other file and store it in the array. So that’s:

let x1[0]=$x1[0]/100 + 1 

let x1[1]=$(x1[1]/100)+1

let x1[2]=$(x1[2]/100)+1

read1=$(awk 'FNR == '$x1[0]' {print $1}' $file2) 

read2=$(awk 'FNR == '$x1[1]' {print $1}' $file2)

read3=$(awk 'FNR == '$x1[2]' {print $1}' $file2)

Do the same thing for another file, except we don’t need $x1 for this.

read4=$(awk 'FNR == 1{print $3,$4,$5,$6}' $file3)

Finally, just output all these values to a file i.e. read1-4

Need to do this in a loop for all the files in the folder, not quite sure how to go about that.The tricky part is that the filename of $file3 depends on the filename of $file1,

so if $file1 = abc123def.fna.map.txt

$file3 would be abc123def.fna

$file2 is hardcoded in it and stays the same for all the iterations.

file1 is a .txt file and a part of it looks like:

 99 58900
 16 59000
 14 73000

file2 contains 600 lines of strings.

'Actinobacillus_pleuropneumoniae_L20'
'Actinobacillus_pleuropneumoniae_serovar_3_JL03'
'Actinobacillus_succinogenes_130Z'

‘file3’ is FASTA file and the first two lines look like this

>gi|94986445|ref|NC_008011.1| Lawsonia intracellularis PHE/MN1-00, complete genome
ATGAAGATCTTTTTATAGAGATAGTAATAAAAAAATGTCAGATAGATATACATTATAGTATAGTAGAGAA

The output can just write all the 4 reads to a random file or if possible can compare read1,read2,read3 and if it matches read4 i.e. the main name should match. In my example:

None of read1-3 match with Lawsonia intracellularis which is a part of read4. So it can just print success or failture to the file.

SAMPLE OUTPUT

Actinobacillus_pleuropneumoniae_L20
Actinobacillus_pleuropneumoniae_serovar_3_JL03
Actinobacillus_succinogenes_130Z

Lawsonia intracellularis

Failture

Sorry I was wrong about the 6 reads, just need 4 actually. Thanks for the help again.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T16:49:18+00:00

This problem can be solved with TXR: http://www.nongnu.org/txr

Okay, I have these sample files (not your inputs, unfortunately):

$ ls -l
total 16
-rwxr-xr-x 1 kaz kaz 1537 2012-03-18 20:07 bac.txr           # the program
-rw-r--r-- 1 kaz kaz  153 2012-03-18 19:16 foo.fna           # file3: genome info
-rw-r--r-- 1 kaz kaz   24 2012-03-18 19:51 foo.fna.map.txt   # file1
-rw-r--r-- 1 kaz kaz  160 2012-03-18 19:56 index.txt         # file2: names of bacteria

$ cat index.txt 
'Actinobacillus_pleuropneumoniae_L20'
'Actinobacillus_pleuropneumoniae_serovar_3_JL03'
'Lawsonia_intracellularis_PHE/MN1-00'
'Actinobacillus_succinogenes_130Z'

$ cat foo.fna.map.txt   # note leading spaces: typo or real?
 13 000
 19 100
 7  200

$ cat foo.fna
gi|94986445|ref|NC_008011.1| Lawsonia intracellularis PHE/MN1-00, complete genome
ATGAAGATCTTTTTATAGAGATAGTAATAAAAAAATGTCAGATAGATATACATTATAGTATAGTAGAGAA

As you can see, I cooked the data so there will be a match on the Lawsonia.

Run it:

$ ./bac.txr foo.fna.map.txt 
Lawsonia intracellularis PHE/MN1-00 ATGAAGATCTTTTTATAGAGATAGTAATAAAAAAATGTCAGATAGATATACATTATAGTATAGTAGAGAA

Code follows. This is just a prototype; obviously it has to be developed and tested using the real data. I’ve made some guesses, like what the Lawsonia entry would look like in the index with the code attached to it.

#!/usr/local/bin/txr -f
@;;; collect the contents of the index fileo
@;;; into the list called index.
@;;; single quotes around lines are removed
@(block)
@  (next "index.txt")
@  (collect)
'@index'
@  (end)
@(end)
@;;; filter underscores to spaces in the index
@(set index @(mapcar (op regsub #/_/ " ") index))
@;;; process files on the command line
@(next :args)
@(collect)
@;;; each command line argument has to match two patterns
@;;; @file1 takes the whole thing
@;;; @file3 matches the part before .map.txt
@  (all)
@file1
@  (and)
@file3.map.txt
@  (end)
@;;; go into file 1 and collect second column material
@;;; over three lines into lineno list.
@  (next file1)
@  (collect :times 3)
 @junk @lineno
@  (end)
@;;; filter lineno list through a function which
@;;; converts to integer, divides by 100 and adds 1.
@  (set lineno @(mapcar (op + 1 (trunc (int-str @1) 100))
                        lineno))
@;;; map the three line numbers to names through the
@;;; index, and bind these three names to variables
@  (bind (name1 name2 name3) @(mapcar index lineno))
@;;; now go into file 3, and extract the name of the
@;;; bacterium there, and the genome from the 2nd line
@  (next file3)
@a|@b|@c|@d| @name, complete genome
@genome
@;;; if the name matches one of the three names
@;;; then output the name and genome, otherwise
@;;; output failed
@  (cases)
@    (bind name (name1 name2 name3))
@    (output)
@name @genome
@    (end)
@  (or)
@    (output)
failed
@    (end)
@  (end)
@(end)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using cygwin on Windows 7 . I have a directory with all

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply