I’m currently working on a function to find all images referenced in an html

Question

0

Asked: June 10, 20262026-06-10T00:05:27+00:00 2026-06-10T00:05:27+00:00

I’m currently working on a function to find all images referenced in an html

0

I’m currently working on a function to find all images referenced in an html file, currently I am trying to to find these substrings within the file: ".bmp" ".gif" ".jpg" ".png" and also want to find their roots eg: /images/foo/ and then use these two substrings to make a new string: /images/foo/bar.jpg I know how I am going to concatenate the strings but I have no idea how I am going to locate the actual substrings, I feel quite overwhelmed right now and would really appreciate some help.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T00:05:29+00:00

The “right” answer to this question ought to urge you to use tools that were built for the job. Smart people write stuff like libxml for a reason. Re-inventing the wheel will only make things more difficult. With libxml, for example, you easily traverse an XML tree like so:

for (cur_node = a_node; cur_node; cur_node = cur_node->next) {
    if (cur_node->type == XML_ELEMENT_NODE) {
        printf("node type: Element, name: %s\n", cur_node->name);
}

The “wrong” answer is to come up with some “trick” for finding the beginning of an image string, either by looking for the beginning of the image tag (<img) or a quote " as Doug mentions in the comments.

You’ll notice that I put right and wrong in quotations. I’m somewhat of a purist and would strongly suggest an XML-oriented solution because it’s wholly generalizable and easily extendible (tomorrow you may say: oh I also need the anchor text). A DOM parser makes every subsequent problem a breeze to solve.

But if you’re working on a proof of concept or prototype (or maybe even homework) where everything’s well-formed and you don’t release your code in the wild, the “wrong” approach may be sufficient.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m currently working on a function to find all images referenced in an html

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply