I am working on a little “development” language for my personal use. I do not plan to make it advanced at all (although I don’t know what will happen at later points), but I’ve run into a problem.
I am not very experienced with RegExp and I want to use it to check whatever a part of the code is defining a new variable or running function. In this case, I need a RegExp that will check if the user is defining a variable.
So, lets say I have a part like this: $abcd = 5
Now, the RegExp should meet the next criteria:
– It should check if the first letter is “$” (that’s easy: “\$”)
– Now, the letters after “$” (lets call them variable name) are the problem. Variable
name can contain letters [a-z, A-Z], numbers [0-9] and underscores [_]
– The next thing, space between the variable name and “=” can be infinitely long (it can
be one space ( ), or a millions of spaces – that should make no difference
– Than comes the equal sign (this is easy as well – “\=”)
– The same as in the third one applies for space after equal sign
– And at the end variable value. There should be no RegExp validation for this.
Thanks in advance!
You do not want to use regular expressions for a task like this. It will quickly turn into a nightmare. What you want is a simple grammar and a recursive-descent parser.
That being said, something like this should work:
This will only match cases where you’re assigning a number to variable. If you want to assign other values, you are going to have to make the regular expression more complicated (see what I meant about it turning into a nightmare? 🙂 ). For example, if you want to assign a string value to your variable, the regex will be different. You will also have to take into account things like escaped quotes and concatenation. Doing these things with a regular expression is very difficult.
A simple grammar for function calls and variable definitions could look like this:
This grammar doesn’t take into account concatenation, numbers with fractional values (i.e., after the decimal point), and negative numbers. But it’s pretty simple and should give you a good starting point. There are many tutorials out there that tell you how to create a recursive-descent parser from an EBNF. You will still need to tokenize your input.