Trying to transition from urllib in python 2 to python 3. I can output the html source using .urlopen() but I can’t index it using .find() method.
import urllib.request
fh = urllib.request.urlopen("http://stackoverflow.com")
html = fh.read()
fh.close()
print(html.find("<p>"))
I get a type error. I understand that it’s returning a byte-array but I’m pretty fuzzy about what that actually means. I’ve tried a few SO answers like this which have been dead-ends. My question is:
Is there a straightforward, native method to get the page source of a URL as a string in python 3?
Use
html.decode('utf-8')(or whatever encoding it happens to be) to get astrobject that you can.find()on..decode()is used to take a flat set of bytes and transform them (via reversing a character encoding, such as UTF-8) into a string of actual codepoints (displayable symbols).