Due to redbubble.com’s lack of an API, I’m using an ATOM feed to steal information about a user’s pictures.
This is what the XML looks like:
<entry>
<id>ID</id>
<published>Date Published</published>
<updated>Date Updated</updated>
<link type="text/html" rel="alternate" href="http://www.redbubble.com/link/to/post"/>
<title>Title</title>
<content type="html">
Blah blah blah stuff about the image..
<a href="http://www.redbubble.com/products/configure/config-id"><img src="http://ih1.redbubble.net/path-to-image" alt="" />
</content>
<author>
<name>Author Name</name>
<uri>http://www.redbubble.com/people/author-user-name</uri>
</author>
<link type="image/jpeg" rel="enclosure" href="http://ih0.redbubble.net/path-to-the-original-image"/>
<category term="1"/>
<category term="2"/>
</entry>
Basically using regex… how would I go about getting the href property inside the link in the content tag?
One thing we know for sure is it will always have configure in the path i.e. http://somesite.com/**configure**/id
So basically I just need to find the URL with configure in and grab the whole thing…
Thanks for your awesome answers but my colleague solved it for me!
This is what i ended up using:
(Ruby regex by the way)