I need to extract the information from ‘< a href=…>something.jpg< /a>’ tags from a

Question

0

Asked: June 16, 20262026-06-16T01:33:21+00:00 2026-06-16T01:33:21+00:00

I need to extract the information from ‘< a href=…>something.jpg< /a>’ tags from a

0

I need to extract the information from ‘< a href=”…”>something.jpg< /a>’ tags from a large string recursively that could contain multiple instances of the tags. I need to do this using regex on Oracle 11g.

An example of what I am looking for is:

Example String:

The string will always contain at least 1 instance of the < a> tag and there is no maximum to how many it can contain
The href will always a xid-[[:digit:]]
The attributes in the tag can vary

<p>text about something important</p><p><a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1234_1" target="_blank">file.pdf</a> </p><p><a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1235_1" target="_blank">anotherfile.pptx</a> </p><p><a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1236_1" target="_blank">yetanotherfile.pdf</a> </p>

Now with that string I want to extract the 3 < a …>…< /a> blocks using

REGEXP_SUBSTR(< string>, ‘< pattern>’, < start>, < occurrence >) and adjusting the occurrence value to grab the 3 instances.

What I have so far is:

SELECT REGEXP_SUBSTR(main_data, ''<a[[:print:]]+href="[[:print:]]+xid-1234_1"[[:print:]]+>[[:print:]]+</a>'', 1, 1)
      FROM table

and the results I get from that are

<a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1234_1" target="_blank">file.pdf</a> </p><p><a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1235_1" target="_blank">anotherfile.pptx</a> </p><p><a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1236_1" target="_blank">yetanotherfile.pdf</a>

So it is starting with the first < a and then grabbing all the way to the last < /a>. When I need it to stop at the first instance of < /a>. Then when I increment the occurrence to 2 it should grab the second set of < a>< /a> tags. However currently setting the occurrence to 2 nothing is returned.

Any help will be appreciated. Thank you

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T01:33:22+00:00

Yes, the non-greedy operator ? is the solution:

SELECT REGEXP_SUBSTR(x,'<a href="(.*?)".*?>(.*?)</a>',1, 3, 'i', 0)
  FROM (SELECT '<p>text about something important</p><p><a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1234_1" target="_blank">file.pdf</a> </p><p><a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1235_1" target="_blank">anotherfile.pptx</a> </p><p><a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1236_1" target="_blank">yetanotherfile.pdf</a> </p>' as x FROM DUAL);

returns

<a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1236_1" target="_blank">yetanotherfile.pdf</a>

or the other tags if you change the 3 to 1 or 2.

If you replace the last 0 with 1, you get the contents of the href:

@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1236_1

If you replace it with 2, you get

yetanotherfile.pdf

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to extract the information from ‘< a href=…>something.jpg< /a>’ tags from a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply