I’m currently working on a function to find all images referenced in an html file, currently I am trying to to find these substrings within the file: ".bmp" ".gif" ".jpg" ".png" and also want to find their roots eg: /images/foo/ and then use these two substrings to make a new string: /images/foo/bar.jpg I know how I am going to concatenate the strings but I have no idea how I am going to locate the actual substrings, I feel quite overwhelmed right now and would really appreciate some help.
I’m currently working on a function to find all images referenced in an html
Share
The “right” answer to this question ought to urge you to use tools that were built for the job. Smart people write stuff like
libxmlfor a reason. Re-inventing the wheel will only make things more difficult. Withlibxml, for example, you easily traverse an XML tree like so:The “wrong” answer is to come up with some “trick” for finding the beginning of an image string, either by looking for the beginning of the image tag (
<img) or a quote"as Doug mentions in the comments.You’ll notice that I put right and wrong in quotations. I’m somewhat of a purist and would strongly suggest an XML-oriented solution because it’s wholly generalizable and easily extendible (tomorrow you may say: oh I also need the anchor text). A DOM parser makes every subsequent problem a breeze to solve.
But if you’re working on a proof of concept or prototype (or maybe even homework) where everything’s well-formed and you don’t release your code in the wild, the “wrong” approach may be sufficient.