I’m having difficulty understanding the import statement and its variations.
Suppose I’m using the lxml module for scraping websites.
The following examples show…
from lxml.html import parse
parse( 'http://somesite' )
…Google’s python style guide prefers the basic import statement, to preserve the namespaces.
I’d prefer to do that, but when I try this:
import lxml
lxml.html.parse( 'http://somesite' )
…then I get the following error message:
AttributeError: ‘module’ object has no attribute ‘html’
Can anyone help me understand what is going on? I’d much prefer to use modules within their namespaces, but need some assistance understanding the semantics.
lxml.htmlis a module. When youimport lxml, thehtmlmodule is not imported into thelxmlnamespace. This is a developer’s decision. Some packages automatically import some modules, some don’t. In this case, you have to do it yourself withimport lxml.html.import lxml.html as LHimports thehtmlmodule and binds it to the nameLHin the current module’s namespace. So you can access the parse function withLH.parse.If you want to delve deeper into when a package (like
lxml) imports modules (likelxml.html) automatically, open a terminal and typeHere is you see the path to the
lxmlpackage’s__init__.pyfile.If you look at the contents you find it is empty. So no submodules are imported. If you look in numpy’s
__init__.py, you see lots of code, amongst which isThese are all submodules which are imported into the
numpynamespace. So from a user’s perspective,import numpyautomatically gives you access tonumpy.linalg,numpy.fft, etc.