I’m having this problem where in trying to grep something on an html page (specifically a user name) I try to retrieve the string by saying:
egrep -o dir\=\"[ltr]*\"\>.*(\<\/span|\<\/a)
By this I am trying to say: “get anything after dir=(“ltr or rlt”)> and before the first </a> or </span> closing tag.
so for example:
dir="ltr">myusername</span>
or
dir="rtl">myusername</a>
There are however multiple span tags on one line, and it is not stopping after the first one, which results in data that I don’t want.
Is there a way to modify my current regex to stop after the first one? And why does it even continue reading?
Thanks
You need to make the
.*non-greedy by adding a?to it.A better solution is this (in raw regex, you will need to escape it):
Capture group 1 ($1) will contain what is between it, and capture group 2 ($2) will contain if its a span or a link termination.
See it in action:
http://regexr.com?32b8k