I am writing a program in PyQT that needs to take messy strings and clean them up. The possible input values are extremely variable. For example I would like to take the strings:
"Seven_Pounds_(BDrip_1080p_ENG-ITA-GER)_Multisub_x264_bluray_.mkv",
"The_Birds_1963_HDTV_XvidHD_720p-NPW.avi",
"1892.XVID.AC3.HD.120_min.avi"
and turn them into:
“Seven Pounds”,
“The Birds”,
“1892”
I have considered using re to escape expressions, but this method seems likely to fail for the last example. The program Media Gerbil uses the google diff-match-patch algorithm to deal with string cleaning. This seems like a better alternative, but I am not sure how to implement it.
Is there another, more effective method for cleaning strings in Python/PyQt, or is the regex or diff-match-patch the best route to follow?
based on your example:
will print: