Afternoon all, I am trying to write a script that will extract the first

Question

0

Asked: June 11, 20262026-06-11T19:46:46+00:00 2026-06-11T19:46:46+00:00

Afternoon all, I am trying to write a script that will extract the first

0

Afternoon all,

I am trying to write a script that will extract the first image from an article via its <img src=""/> tags. So if an article has:

<p>Lorem ipsum dolor sit amet, labore et dolore magna aliqua.<img src="example.jpg"/> Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.</p>

I would like to extract the whole image tag, <img src="example.jpg"/>.

I found this regex which extracts just the location of the image:

content_to_extract_from[/img.*?src="(.*?)"/i,1]

produces, “example.jpg”.

Does anyone know a regex that will capture the tags aswell?

Thanks in advance, Andy

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T19:46:47+00:00

Using regexes to parse markup is asking for trouble. You can probably write something that mostly works but which breaks on cases you didn’t foresee. For example you can enclose attributes with single quotes instead of double quotes, which your regex won’t handle

Much more reliable is to use a real parser, such as nokogiri

html = Nokogiri::HTML.fragment('<p>Lorem ipsum dolor sit amet, labore et dolore magna aliqua.<img src="example.jpg"/> Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.</p>')
html.css('img').collect(&:to_s) #=> ["<img src=\"example.jpg\">"]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Afternoon all, I am trying to write a script that will extract the first

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply