I’m having this problem where in trying to grep something on an html page

Question

Asked: June 12, 20262026-06-12T08:00:18+00:00 2026-06-12T08:00:18+00:00

I’m having this problem where in trying to grep something on an html page (specifically a user name) I try to retrieve the string by saying:

egrep -o dir\=\"[ltr]*\"\>.*(\<\/span|\<\/a)

By this I am trying to say: “get anything after dir=(“ltr or rlt”)> and before the first </a> or </span> closing tag.

so for example:

dir="ltr">myusername</span>

or

dir="rtl">myusername</a>

There are however multiple span tags on one line, and it is not stopping after the first one, which results in data that I don’t want.

Is there a way to modify my current regex to stop after the first one? And why does it even continue reading?

Thanks

You must login to add an answer.

Need An Account,

Editorial Team · Answer 1 · 2026-06-12T08:00:20+00:00

Editorial Team

You need to make the .* non-greedy by adding a ? to it.

egrep -o dir\=\"[ltr]*\"\>.*?(\<\/span|\<\/a)

A better solution is this (in raw regex, you will need to escape it):

dir="[ltr]{3}"[^>]*?>(.*?)(</span>|</a>)

Capture group 1 ($1) will contain what is between it, and capture group 2 ($2) will contain if its a span or a link termination.

The Archive Base Latest Questions