I am writing a discover service that takes a URL and returns the HTML located at that page.
From that page, I need to “scrape” all the WSDL URL’s.
So I need something like the following, but I am not sure how to specify the regex to pass into the pattern matching.
string wsdlPattern = //SOME REGEX THAT MATCHES WSDL http:{address}wsdl
Regex wsdlRegex = new Reges(wsdlPattern);
MatchCollection matches = wsdlRegex.Match(html);
Can somebody please help me figure how I can do this?
Try this:
http://[^\s]*?.wsdlThe regular text parts are obvious: it needs to start with
http://and end with.wsdl.[^\s]means “any non-whitespace character”, and*?means “as few as possible” (this is necessary in case you have something likehttp://www.blah.com/a.wsdl<br>http://www.blah.com/b.wsdl. Without the?, you’d match that whole thing as one string.)This isn’t perfect, but it should get you started.
If you want to play with regex, this is a great resource:
http://www.gskinner.com/RegExr