if have a problem getting the locale out of a string like:
menu_title_en_US
menu_title_en
The locale in this string would be “en_US”. The string that i have to deal with only have alphanumeric characters and underscores. Like variable names in Python.
I have tried the following regex so far:
re.compile(r'_(?P<base_code>[a-z]{2,5})(_(?P<ext_code>[a-z]{2,5})){0,1}$')
which is working fine for strings like “menu_en” and “menu_en_US” but for stings like “menu_title_en” or “menu_title_en_US” it’s not working as expected (extracting en or en_US).
Maybe someone has a quick idea how to solve this Problem.
If you know the locale is always
en,en_us, oren_US(stated in a comment), then you don’t need a regex at all:or
You could add more checks if the data could contain something that looked like a locale but wasn’t — these just check for the underscore plus two characters after.
However, the regex can be fixed / simplified a bit, too:
?is the same as{0,1}, and since the codes are always two characters you want{2]not{2,5}. You want to accept either lower or upper case for the second code.It still will have false positives, though.