I’m working on converting an RSS feed from an old shopping cart to a new one. The new cart will take CSV as an input. I think normally, I could tinker around with it and maybe figure it out – but there are some additional tasks that need completed with the feed before it gets put into CSV.
First, here is the raw feed http://www.bellyscarf.com/rsscategoryproducts.sc?categoryId=6
I don’t actually need too much data from the RSS, but here is what I do need (from each , these are my ‘fields’):
- title
- description*
- price
- sale price
*The description is where I need some work done. It has a bunch of html special characters, and html that I would like to remove (including any image references). Plain text is what I am looking for, in simpler terms.
Typically, are the fields added after generating the CSV file? I don’t mind adding them afterwards. I will be working with the CSV in Excel before it goes live anyways, adding additional fields and info.
Here is some code I wrote to parse the XML/RSS:
$ch = curl_init('http://bellyscarf.com/rsscategoryproducts.sc?categoryId=6');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
$data = curl_exec($ch);
curl_close($ch);
$doc = new SimpleXmlElement($data, LIBXML_NOCDATA);
if(isset($doc->channel))
{
parseRSS($doc);
} else {
echo "Not RSS";
}
function parseRSS($xml)
{
echo "<strong>".$xml->channel->title."</strong>";
$cnt = count($xml->channel->item);
for($i=0; $i<$cnt; $i++)
{
$url = $xml->channel->item[$i]->link;
$title = $xml->channel->item[$i]->title;
$desc = html_entity_decode($xml->channel->item[$i]->description);
echo '<a href="'.$url.'">'.$title.'</a>'.$desc.'';
}
}
You can see its results here (not sure if it helps anything):
http://bestsox.com/zumba.php
So how can I generate a CSV file with this data?
Alex already showed how you can make use of
fputcsvto create the CSV file, but you still have problems reading from the feed.First of all, you can more directly iterate over the channel items, so it’s easier to write your script:
To get the price tag, you need to access a children of another namespace. The namespace is defined within the RSS file and you need to know the URI of it. For
gd:this is:You can then access the price like in the following, extended example:
Now you wrote you wanted to remove the tags from the description field. This is not really good code, but this does the job quickly:
It makes use of
strip_tagsto remove all tags and then whitespaces are normalized withstr_replaceandpreg_replace.I hope this is helpful.