I have a teb-delimited file that has gene names in one column and expression values for these genes in the other. I want to delete certain genes from this file using grep. So, this:
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
"42266" "snoMBII-202" "0"
"42267" "snoMBII-202" "0"
"42268" "snoMe28S-Am2634" "0"
"42269" "snoMe28S-Am2634" "0"
"42270" "snoR26" "0"
"42271" "SNORA1" "0"
"42272" "SNORA1" "0"
becomes this:
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
I’ve used the following command that i’ve put together with my limited terminal knowledge:
grep -iv sno* <input.text> | grep -iv rp* | grep -iv U6* | grep -iv 7SK* > <output.txt>
So with this command, my output file lacks genes that start with sno, u6 and 7sk but somehow grep has deleted all the genes that has “r” in them instead of the ones that start with “rp”. I’m very confused about this. Any ideas why sno* works but rp* not?
Thanks!
The
grepcommand uses regular expressions, not globbing patterns.The pattern
rp*means “‘r’ followed by zero or more ‘p'”. What you really want isrp.*, or even better,"rp.*(or even just"rp, there’s no point in trying to grep for anything after the “rp” after all). Likewise,sno*means “‘sn’ followed by zero or more ‘o'”. Again, you’d wantsno.*or"sno.*(or even just"sno).