I’ve been trying (unsuccessfully) to solve this problem for a few hours and need some help. I used Firebug to extract a couple hundred lines of HTML that look like this:
<option value="1b4f4aed-cf1f-4b39-ae27">Foo</option>
<option value="1a05f93f-dd51-449d-b039">Bar</option>
<option value="f62d2d29-29fc-4f7c-9331">Bacon</option>
I saved the lines to a text file. What I want is a (Python preferred, with Ruby as an alternative) script to open process and close the file. The processing should result in a new text file being saved that looks like this:
Foo
Bar
Bacon
That’s it. Thanks in advance for your help.
Per your comment above, I would suggest BeautifulSoup with anything HTML related. Since you are early in your learning stage, probably best to associate ‘HTML’ with ‘BeautifulSoup’ (and not regex 🙂 ). Here is a very basic example:
Here we pass our HTML to
BeautifulSoupand assign it to thesoupvariable. Now we have an object that contains our HTML and a large amount of methods for interacting with it in a user-friendly way. Here, we use thefind_allmethod (documentation here) to find alloptiontags in our HTML. Now when we iterate, we are iterating throughTagobjects, which have their own special properties/methods. Here we pick one of them (.text) to display the text of theTagelement (which in this case will be the text enclosed in the tag).