I am working on a personal project that is designed to open a file specified by the user, then to take in user input and use that input as a regular expression to search the file with. The purpose of this is to gain a deeper understanding of how regular expressions work, and how to incorporate them into programs.
My problem lies in that all input the user gives me is formatted as a string. So (correct me if I’m wrong), an input of [a-z]+ will result in the search expression “[a-z]+”. This is a problem if I want r”[a-z]+” as my search expression, as putting that in as user input will give me “r”[a-z]+”” (again, correct me if I’m wrong). This will obviously not work with regex. How do I format the input so that an input of r”[a-z]+” remains r”[a-z]+”?
This is the code section in question. The textFile in the function arguments is imported from another section of the program, and is used in the regex search:
def new_search_regex(textFile):
"""Query for input, then performs RegEx() with user's input"""
global totalSearches
global allSearchResults
# ask user for regular expression to be searched
expression = raw_input("Please enter the Regular Expression to be searched: ")
# perform initial regex search
foundRegex = re.search(expression, textFile)
# if Regex search successful
if foundRegex != None:
# Do complete regex search
foundRegex = re.findall(expression, textFile)
# Print result
print "Result: " + str(foundRegex)
# Increment global total
totalSearches += 1
# create object for result, store in global array
reg_object = Reg_Search(totalSearches, expression, foundRegex)
allSearchResults.append(reg_object)
print "You're search number for this search is " + str(totalSearches) # Inform user of storage location
# if Regex search unsuccessful
else:
print "Search did not have any results."
return
Note: At the end I create an object for the result, and store it in a global array.
This is also assuming for now that the user is competently entering non-system destroying regex’s. I will soon start adding in error checking though, such as using .escape on the user input. How will this affect my situation? Will it wreak havoc with the user including ” in the input?
The
r"..."syntax is only useful to prevent the python compiler to interpret escape sequences (\nbeing converted to newline character for example). Once parsed by the compiler it will just be a regular string.We you read input from the user with `raw_input the compiler does not perform any escape sequence interpretations. You don’t have to do anything, the string is already correctly interpreted.
You can test this yourself like that: