So i’m looking to scrape rapidshare.com links from websites. I have the following regular

Question

0

Asked: May 13, 20262026-05-13T12:47:00+00:00 2026-05-13T12:47:00+00:00

So i’m looking to scrape rapidshare.com links from websites. I have the following regular

0

So i’m looking to scrape rapidshare.com links from websites. I have the following regular expressions to find links:

<a href=\"(http://rapidshare.com/files/(\\d+)/(.+)\\.(\\w{3,4}))\"

http://rapidshare.com/files/(\\d+)/(.+)\\.(\\w{3,4})

How can I write a regex that will exclude text that is embedded in a <a href="..."> tag. and only capture the text in >here</a>

I also have to bare in mind that not all links are embedded in href tags. Some are just displayed in plain text.

Basically is there a wway to exclude patterns in regex ?

Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T12:47:01+00:00

To capture the inner text of an anchor tag, while ignoring all attribute text of the tag, you’d use the pattern:

<a href="http://rapidshare.com/files/(\d+)/(.+)\.(\w{3,4})[^>]*>(.*?)</a>

The [^>]* part matches everything else in your tag up until the end of the start tag.
The (.*?) performs a non-greedy capture of the inner text.

If you want to capture anchor tag links and non-anchor tag links, then those are really two separate problems. There’s probably a regex for it, but it would be terribly complicated. You’re better off simply looking for non-anchor-tag links separately with the simple regex:

[^'"]http://rapidshare.com/files/(\d+)/(.+)\.(\w{3,4})

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

So i’m looking to scrape rapidshare.com links from websites. I have the following regular

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply