I’m trying to build a sitemap and parsing the html bodies for hrefs that doesn’t have # (as those with hashes are just sub chapter links in some content page htmls).
My regexp now: <a\\s[^>]*href\\s*=\\s*\"([^\"]*)\"[^>]*>(.*?)</a>
I guess I should use [^#] or !# to exclude the # from hrefs but could not solve it with just trying and googling after it. Thanks in advance for helping me out!
Done it. Just inserted the
#too in the[^\"]block. 😀