On a website I’m creating I’m using Python-Markdown to format news posts. To avoid issues with dead links and HTTP-content-on-HTTPS-page problems I’m requiring editors to upload all images to the site and then embed them (I’m using a markdown editor which I’ve patched to allow easy embedding of those images using standard markdown syntax).
However, I’d like to enforce the no-external-images policy in my code.
One way would be writing a regex to extract image URLs from the markdown sourcecode or even run it through the markdown renderer and use a DOM parser to extract all src attributes from img tags.
However, I’m curious if there’s some way to hook into Python-Markdown to extract all image links or execute custom code (e.g. raising an exception if the link is external) during parsing.
One approach would be to intercept the
<img>node at a lower level just after Markdown parses and constructs it:Output: