On a website I’m creating I’m using Python-Markdown to format news posts. To avoid

Question

0

Asked: May 22, 20262026-05-22T02:18:18+00:00 2026-05-22T02:18:18+00:00

On a website I’m creating I’m using Python-Markdown to format news posts. To avoid

0

On a website I’m creating I’m using Python-Markdown to format news posts. To avoid issues with dead links and HTTP-content-on-HTTPS-page problems I’m requiring editors to upload all images to the site and then embed them (I’m using a markdown editor which I’ve patched to allow easy embedding of those images using standard markdown syntax).

However, I’d like to enforce the no-external-images policy in my code.

One way would be writing a regex to extract image URLs from the markdown sourcecode or even run it through the markdown renderer and use a DOM parser to extract all src attributes from img tags.

However, I’m curious if there’s some way to hook into Python-Markdown to extract all image links or execute custom code (e.g. raising an exception if the link is external) during parsing.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T02:18:18+00:00

One approach would be to intercept the <img> node at a lower level just after Markdown parses and constructs it:

import re
from markdown import Markdown
from markdown.inlinepatterns import ImagePattern, IMAGE_LINK_RE

RE_REMOTEIMG = re.compile('^(http|https):.+')

class CheckImagePattern(ImagePattern):

    def handleMatch(self, m):
        node = ImagePattern.handleMatch(self, m)
        # check 'src' to ensure it is local
        src = node.attrib.get('src')
        if src and RE_REMOTEIMG.match(src):
            print 'ILLEGAL:', m.group(9)
            # or alternately you could raise an error immediately
            # raise ValueError("illegal remote url: %s" % m.group(9))
        return node

DATA = '''
![Alt text](/path/to/img.jpg)
![Alt text](http://remote.com/path/to/img.jpg)
'''

mk = Markdown()
# patch in the customized image pattern matcher with url checking
mk.inlinePatterns['image_link'] = CheckImagePattern(IMAGE_LINK_RE, mk)
result = mk.convert(DATA)
print result

Output:

ILLEGAL: http://remote.com/path/to/img.jpg
<p><img alt="Alt text" src="/path/to/img.jpg" />
<img alt="Alt text" src="http://remote.com/path/to/img.jpg" /></p>

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

On a website I’m creating I’m using Python-Markdown to format news posts. To avoid

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply