im rubbish with regex if someone could help id be very appreciative. its going

Question

0

Editorial Team

Asked: June 8, 20262026-06-08T10:08:31+00:00 2026-06-08T10:08:31+00:00

im rubbish with regex if someone could help id be very appreciative. its going

0

im rubbish with regex if someone could help id be very appreciative.

its going to be a bit of a tough one i imagine – so my hats off too anyone that can solve it!

so say we have file that contains 2 html tags in the following formats:

abc1234
<a href="http://google.com">Some Text</a> <P>
<a href="http://www.google.com" rel="nofollow">Some Text</a>
abc1234

im trying to remove everything in those tags except the url (and leaving other text) so the output of the regex in this document would be

abc1234
http://google.com <P>
http://www.google.com
abc1234

Can any guru figure this one out? Id prefer one regex expression to handle both cases but two seperate ones would be fine too.

Thanks in advance/

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T10:08:34+00:00

I’m a Rubyist, so my example is going to be in Ruby. I’d recommend using two regexes, just to keep things straight:

url_reg = /<a href="(.*?)"/   # Matches first string within <a href=""> tag
tag_reg = /(<a href=.*?a>)/   # Matches entire <a href>...</a> tag

You’ll want to pull the URL with the first regex out and store it temporarily, then replace the entire contents of the tag (matched with the tag_reg) with the stored URL.

You might be able to combine it, but it doesn’t seem like a good idea. You’re fundamentally altering (by deleting) the original tag, and replacing it with something inside itself. Less chance of things going wrong if you separate those two steps as much as possible.

Example in Ruby

def replace_tag(input)
  url_reg = /<a href="(.*?)"/    # Match URLS within an <a href> tag
  tag_reg = /(<a href=.*?a>)/     # Match an entire <a href></a> tag

  while (input =~ tag_reg) # While the input has matching <a href> tags
    url = input.scan(url_reg).flatten[0]  # Retrieve the first URL match
    input = input.sub(tag_reg, url)       # Replace first tag contents with URL
  end

  return input
end

File.open("test.html", "r") do |html_input|       # Open original HTML file
  File.open("output.html", "w") do |html_output|  # Open an output file
    while line = html_input.gets                  # Read each line
      output = replace_tag(line)                  # Perform necessary substitutions
      html_output.puts(output)                    # Write output lines to file
    end
  end
end

Even if you don’t use Ruby, I hope the example makes sense. I tested this on your given input file, and it produces the expected output.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

im rubbish with regex if someone could help id be very appreciative. its going

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply