I am writing a short script to sanitise folder and file names for upload to SharePoint. Since SharePoint is fussy and has some filename rules beyond simple disallowed characters (multiple consecutive periods are disallowed for instance) it seemed like regular expressions were the way to go rather than simple replacement of single characters. One expression that doesn’t seem to be working however is:
[/<>*?|:"~#%&{}\\]+
As a simple character class match I would have expected this to work fine, and it appears to do so in notepad++. My expectation was that a string like
St\r/|ng
with the above regex would match \, / and |. However no matter what I do I can only get the string to match the first backslash, or the first of whatever character is in that class that it comes across. This is being done with the Python re library. Does anyone know what the issue is here?
import os, sys, shutil, re
def cleanPath(path):
#Compiling regex...
multi_dot = re.compile(r"[\.]{2,}")
start_dot = re.compile(r"^[\.]")
end_dot = re.compile(r"[\.]$")
disallowed_chars = re.compile(r'[/<>*?|:"~#%&{}\\]+')
dis1 = re.compile(r'\.files$')
dis2 = re.compile(r'_files$')
dis3 = re.compile(r'-Dateien$')
dis4 = re.compile(r'_fichiers$')
dis5 = re.compile(r'_bestanden$')
dis5 = re.compile(r'_file$')
dis6 = re.compile(r'_archivos$')
dis7 = re.compile(r'-filer$')
dis8 = re.compile(r'_tiedostot$')
dis9 = re.compile(r'_pliki$')
dis10 = re.compile(r'_soubory$')
dis11 = re.compile(r'_elemei$')
dis12 = re.compile(r'_ficheiros$')
dis13 = re.compile(r'_arquivos$')
dis14 = re.compile(r'_dosyalar$')
dis15 = re.compile(r'_datoteke$')
dis16 = re.compile(r'_fitxers$')
dis17 = re.compile(r'_failid$')
dis18 = re.compile(r'_fails$')
dis19 = re.compile(r'_bylos$')
dis20 = re.compile(r'_fajlovi$')
dis21 = re.compile(r'_fitxategiak$')
regxlist = [multi_dot,start_dot,end_dot,disallowed_chars,dis1,dis2,dis3,dis4,dis5,dis5,dis6,dis7,dis8,dis9,dis10,dis11,dis12,dis13,dis14,dis15,dis16,dis17,dis18,dis19,dis20,dis21]
print("************************************\n\n"+path+"\n\n************************************\n")
for x in regxlist:
match = x.search(path)
if match:
print("\n")
print("MATCHED")
print(match.group())
print("___________________________________________________________________________")
return path
#testlist of conditions that should be found, some OK, some bad
testlist = ["string","str....ing","str..ing","str.ing",".string","string.",".string.","$tring",r"st\r\ing","st/r/ing",r"st\r/|ng","/str<i>ng","str.filesing","string.files"]
testlist_ans = ["OK","Match ....","Match ..","OK","Match .","Match .","Match . .","OK",r"Match \ ","Match /",r"Match \/|","Match / < >","OK","Match .files"]
count = 0
for i in testlist:
print(testlist_ans[count])
count = count + 1
cleanPath(i)
What Python re command do you use ?
You should use : re.findall