I’ve been tasked with build an accessible RSS feed for my company’s job listings. I already have an RSS feed from our recruiting partner; so I’m transforming their RSS XML to our own proxy RSS feed to add additional data as well limit the number of items in the feed so we list on the latest jobs.
The RSS validates via feedvalidator.org (with warnings); but the problem is this. Unfortunately, no matter how many times I tell them not to; my company’s HR team directly copies and pastes their Word documents into our Recruiting partners CMS when inserting new job listings, leaving WordML in my feed. I believe this WordML is causing issues with Feedburner’s BrowserFriendly feature; which we want to show up to make it easier for people to subscribe. Therefore, I need to remove the WordML markup in the feed.
Anybody have experience doing this? Can anyone point me to a good solution to this problem?
Preferably; I’d like to be pointed to a solution in .Net (VB or C# is fine) and/or XSL.
Any advice on this is greatly appreciated.
Thanks.
I haven’t yet worked with WordML, but assuming that its elements are in a different namespace from RSS, it should be quite simple to do with XSLT.
Start with a basic identity transform (a stylesheet that add all nodes from the input doc ‘as is’ to the output tree). You need these two templates:
A transformation using a stylesheet containing just the above two templates would exactly reproduce its input document on output, modulo those things that standards-compliant XML processors are permitted to change, such as entity replacement.
Now, add in a template that matches any element in the WordML namespace. Let’s give it the namespace prefix ‘wml’ for the purposes of this example:
The beginning and end of the stylesheet are left as an exercise for the coder.