For learning purpose, I am trying to use Clojure to scrape data from the following site.
I would like to know how do to get the data in the table “bm_center bm_dataTable”.
The challenge I have is that this table’s DOM is not available on this page’s html source, because it is dynamically generated in the browser.
How do I get the hml source of the table?
I know very little about web programming but am willing to learn. Thank you in advance for your patience.
The DOM is normally a thing that lives in the browser. The browser pulls down the same text that you’re seeing in Clojure and then builds the DOM that it uses to render the page etc…
You can manipulate the text manually to pull out what you want by writing Clojure code. You could use a Java library like JSoup to extract information from the HTML. The standard Java libraries also come with an HTML parser, but I would avoid it. It is difficult to use and doesn’t really bring much benefit.