I have the following code.
html = urllib2.urlopen(
'https://ebet.tab.co.nz/results/CHCG-reslt05070400.html').read()
soup = BeautifulSoup(html)
data = soup.findAll('div', {'class' : 'header bold'})
match = re.search('R', data[0].text)
race_title = data[0].text[(match.start()):]
race_title = str(race_title.strip(' \t\n\r'))
print race_title
The output I get on the screen in the console is below
Race 1 PEDIGREE ADVANCE SPRINT
C0
295 m
I thought strip would get rid of any type of spaces between SPRINT and C0 but obviously I am missing something so I need help understanding this result. Is it because the bs4 output the string in unicode or something?
strip()removes only leading or trailing characters. if you want to remove the newlines you should usereplace("\n","")