I am trying to scrape the website here: ftp://ftp.sec.gov/edgar/daily-index/. Using the code as shown below:
from bs4 import BeautifulSoup
import urllib.request
html = urllib.request.urlopen("ftp://ftp.sec.gov/edgar/daily-index/")
soup = BeautifulSoup(line, "lxml")
soup.a # or soup.find_all('a') neither of them works
#return None.
Please help, I am really frustrated by this. My suspicion is that the tag is causing the problem. The site’s Html looks well formated (matched tags), so I am lost as to why BeautifulSoup doesn’t find anything. Thanks
The
ftp://ftp.sec.gov/edgar/daily-index/URL leads to a FTP directory, not an HTML page.Your browser could generate HTML based on the FTP directory contents, but the server does not send you HTML when you load that resource with
urllib.request.You probably want to use the
ftplibmodule directly instead to read the directory listing, or inspect the return value ofurlopen(...).read()first.