I’m trying to write a regex/method that extracts Variables from an input String that represents “math/algebraic expression” and a special pattern that looks like this “PROPERTY(AnyOtherAplhaNumeric)” which can also be a variable.
My definition of a variable:
1) Can contain alphanumeric values only
2) Must be at least 1 char or more
3) Cannot start with a digit , must start with [A-Za-z]
4) A variable for example “X”, can be surrounded by this string “PROPERTY(X)”, therefore the variable becomes “PROPERTY(X)”
My current Method & Regex (works only in some cases):
public Set<String> extractUniqueVarsFromExpression(String expression) {
Set<String> varsSet = null;
Pattern p = null;
Matcher m = null;
System.out.println(expression);
if (expression != null) {
varsSet = new java.util.LinkedHashSet<String>();
//"[A-Za-zPROPERTY(?)_][A-Za-z0-9PROPERTY(?)_]*||[A-Za-z_][A-Za-z0-9_]*"
//"[[A-Za-z_][A-Za-z0-9_]*"
p = Pattern.compile("[A-Za-zPROPERTY(?)_][A-Za-z0-9PROPERTY(?)_]*||[A-Za-z_][A-Za-z0-9_]*",
Pattern.CASE_INSENSITIVE);
m = p.matcher(expression);
while (m.find()) {
String group = m.group().trim();
//do not add duplicates
if (!varsSet.contains(group))
{
varsSet.add(group);
System.out.println(" Variable : " + group);
}//end if not duplicate
}// end while
}
System.out.println();
return varsSet;
}
Examples/Cases:
Ex #1:
Input: [(ibdweight / ibdheight) * ibdheight] * 703
Output:
Variable : PROPERTY(ibdweight)
Variable : PROPERTY(ibdheight)
Ex #2:
Input: [ibdweight / ibdheight * ibdheight] * 703
Output:
Variable : ibdweight
Variable : ibdheight
Ex #3:
Input: [PROPERTY(ibdweight) / [PROPERTY(ibdheight) * PROPERTY(ibdheight)] * 703
Output:
Variable : PROPERTY(ibdweight)
Variable : PROPERTY(ibdheight)
Ex #:4
These are the cases that don’t work (examples 4 to 6):
The problem is the parenthesis are being picked up as variables:
Input: ( Mass * ( Acceleration + whatever ))
Output:
Variable : (
Variable : Mass
Variable : Acceleration
Variable : whatever
Variable : ))
Ex #:5
The problem is the parenthesis are being picked up as variables:
Input: ( Base * Height ) / 2
Output:
Variable : (
Variable : Base
Variable : Height
Variable : )
Ex #:6
The problem is the parenthesis are being picked up as variables OR attached to a variable:
Input: [((( var * var2 ) var3 ) + ( var1 / var4 ) var5) / var6 ]
Output:
Variable : (((
Variable : var
Variable : var2
Variable : )
Variable : var3
Variable : (
Variable : var1
Variable : var4
Variable : var5)
Variable : var6
The problem with your regular expression is that you have the both the parentheses and the word “PROPERTY” inside of the brackets. Brackets are for specifying a set of characters, not strings, any member of which will match.
A simple (although probably not optimal) variation that should work for you is:
(PROPERTY\([A-Za-z][A-Za-z0-9_]*\))|([A-Za-z][A-Za-z0-9_]*)A slightly better version would be:
(PROPERTY\([A-Za-z]\w*\))|([A-Za-z]\w*)