I’m trying to keep linebreaks reading from a txt file when I print the content into an HTML one.
I get results from boilerpipe in this way:
class BottomPipeResult :
AGENT_ID = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"
BOTTOMPIPE_URL = "http://boilerpipe-web.appspot.com/extract?url={0}&extractor=LargestContentExtractor&output=text"
#BOTTOMPIPE_URL = "http://boilerpipe-web.appspot.com/extract?url={0}&extractor=ArticleExtractor&output=htmlFragment"
_myBPPage = ""
# scrape and get results from bottompipe
def scrapeResult(self, theURL, user_agent=AGENT_ID) :
request = urllib2.Request(self.BOTTOMPIPE_URL.format(theURL))
if user_agent:
request.add_header("User-Agent", user_agent)
pagefile = urllib2.urlopen(request)
realurl = pagefile.geturl()
f = pagefile
self._myBPPAge = f.read()
return(self._myBPPAge)
but when I reprint them to html I loose all the linebreaks.
Here’s the code I use to write into HTML
f = open('./../../entries-new.html', 'a')
f.write(BottomPipeResult.scrapeResult(myLinkResult))
f.close()
Here an example of booilerpipe text result:
http://boilerpipe-web.appspot.com/extract?url=http%3A%2F%2Fresult.com&extractor=ArticleExtractor&output=text
i tried this but it doesn’t work:
myLinkResult = re.sub('\n','<br />', myLinkResult)
Any suggestion?
Thanks
I modified your code just a touch so it was runnable and it seems to “work” properly for me. The resulting output has line endings where expected. I’m seeing some encoding issues, but no line ending issues.
Code
Output
Making the results more “HTML” like
As far as html output is concerned, you probably want to wrap each line in a
<p>paragraph tag.