I’m having problems parsing the SEC Edgar files
Here is an example of this file.
The end result is I want the stuff between <XML> and </XML> into a format I can access.
Here is my code so far that doesn’t work:
scud = open("http://sec.gov/Archives/edgar/data/1475481/0001475481-09-000001.txt")
full = scud.read
full.match(/<XML>(.*)<\/XML>/)
Ok, there are a couple of things wrong:
Here’s a quick piece of code to retrieve the page, strip the garbage, and parse the resulting content as XML: