I am writing a short script to sanitise folder and file names for upload

Question

0

Asked: June 11, 20262026-06-11T05:23:59+00:00 2026-06-11T05:23:59+00:00

I am writing a short script to sanitise folder and file names for upload

0

I am writing a short script to sanitise folder and file names for upload to SharePoint. Since SharePoint is fussy and has some filename rules beyond simple disallowed characters (multiple consecutive periods are disallowed for instance) it seemed like regular expressions were the way to go rather than simple replacement of single characters. One expression that doesn’t seem to be working however is:

[/<>*?|:"~#%&{}\\]+

As a simple character class match I would have expected this to work fine, and it appears to do so in notepad++. My expectation was that a string like

St\r/|ng

with the above regex would match \, / and |. However no matter what I do I can only get the string to match the first backslash, or the first of whatever character is in that class that it comes across. This is being done with the Python re library. Does anyone know what the issue is here?
import os, sys, shutil, re

def cleanPath(path):
    #Compiling regex...
    multi_dot = re.compile(r"[\.]{2,}")
    start_dot = re.compile(r"^[\.]")
    end_dot = re.compile(r"[\.]$")
    disallowed_chars = re.compile(r'[/<>*?|:"~#%&{}\\]+')
    dis1 = re.compile(r'\.files$')
    dis2 = re.compile(r'_files$')
    dis3 = re.compile(r'-Dateien$')
    dis4 = re.compile(r'_fichiers$')
    dis5 = re.compile(r'_bestanden$')
    dis5 = re.compile(r'_file$')
    dis6 = re.compile(r'_archivos$')
    dis7 = re.compile(r'-filer$')
    dis8 = re.compile(r'_tiedostot$')
    dis9 = re.compile(r'_pliki$')
    dis10 = re.compile(r'_soubory$')
    dis11 = re.compile(r'_elemei$')
    dis12 = re.compile(r'_ficheiros$')
    dis13 = re.compile(r'_arquivos$')
    dis14 = re.compile(r'_dosyalar$')
    dis15 = re.compile(r'_datoteke$')
    dis16 = re.compile(r'_fitxers$')
    dis17 = re.compile(r'_failid$')
    dis18 = re.compile(r'_fails$')
    dis19 = re.compile(r'_bylos$')
    dis20 = re.compile(r'_fajlovi$')
    dis21 = re.compile(r'_fitxategiak$')
    regxlist = [multi_dot,start_dot,end_dot,disallowed_chars,dis1,dis2,dis3,dis4,dis5,dis5,dis6,dis7,dis8,dis9,dis10,dis11,dis12,dis13,dis14,dis15,dis16,dis17,dis18,dis19,dis20,dis21]
    print("************************************\n\n"+path+"\n\n************************************\n")
    for x in regxlist:
        match = x.search(path)
        if match:
            print("\n")
            print("MATCHED")
            print(match.group())

    print("___________________________________________________________________________")
    return path


#testlist of conditions that should be found, some OK, some bad
testlist = ["string","str....ing","str..ing","str.ing",".string","string.",".string.","$tring",r"st\r\ing","st/r/ing",r"st\r/|ng","/str<i>ng","str.filesing","string.files"]
testlist_ans = ["OK","Match ....","Match ..","OK","Match .","Match .","Match . .","OK",r"Match \ ","Match /",r"Match \/|","Match / < >","OK","Match .files"]
count = 0
for i in testlist:
    print(testlist_ans[count])
    count = count + 1

    cleanPath(i)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T05:24:01+00:00

Editorial Team

2026-06-11T05:24:01+00:00Added an answer on June 11, 2026 at 5:24 am

What Python re command do you use ?

You should use : re.findall

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am writing a short script to sanitise folder and file names for upload

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply