I’m writing a python script to loop through a directory of CSS files and save the contents of any which contain a specifically-formatted javadoc style comment.
The comment/CSS looks like this:
/**thirdpartycss
* @description Used for fixing stuff
*/
.class_one {
margin: 10px;
}
#id_two {
padding: 2px;
}
The regex to fetch the entire contents of the file looks like this:
pattern = "/\*\*thirdpartycss(.*?)}$"
matches = re.findall(pattern, css, flags=re.MULTILINE | re.DOTALL)
This gives me the file contents. What I want to do now is write a regex to grab each CSS definition within the class. This is what I tried:
rule_pattern = "(.*){(.*)}?"
rules = re.findall(rule_pattern, matches[0], flags=re.MULTILINE | re.DOTALL)
I’m basically trying to find any text, then an opening {, any text, then a closing } – I want a list of all of the CSS classes, essentially, but this just returns the entire string in one chunk.
Can anybody point me in the right direction?
Thanks.
Matt
{(.*)}is a greedy match — it will match from the first{to the last}, thus gobble up any{/}pairs that might be inside those. You want non-greedy matching, that isthe difference is the question mark after the asterisk, making it non-greedy.
This still won’t work if you need to properly match “nested” braces — but then, nothing in the RE world will: among regular languages many well-known limitations (regular languages are those that regular expressions can match) is that “properly nesting” any kind of open/closed parentheses is impossible (some incredibly-extended so-called-RE manage to, but not Python’s, and anybody with CS background will find calling those expression “regular” offensive anyway;-). If you need more general parsing than REs can afford, pyparsing or other full-fledged Python parsers are the right way to go.