I experienced a problem using regex with awk. In particular I need to find all words in a file that:
- begin with “un”;
- are at least 6 character long
- end with two vowel
(these conditions must be verified contemporaneously).
I’ve used this regex
cat file.txt | awk '{ for(k=1; k<=NF; k++)
if ($k ~ /^un.{2,}[aeiouAEIOU]{2}$/ )
print $k; }'
the problem is that sometimes works and sometimes not.
I’ve tried it with two files:
test.txt
unaaaiuolaa
unaaaaaa
unbbaa
file.txt
unaaaiuolaa
unarmadio
Mysteriously the regex matches all the words in the first file but only “unarmadio” in file.txt (notice that “unaaaiuolaa” is the same in both files).
May someone explain me why?
As grok12 said, the problem was an empty space at the end of “unaaaiuolaa”. Deleting it solved the problem.