Not a regex expert, but I know enough to be dangerous, need some help with an expression I am working on. Long story short, a recent database upgrade has invalidated thousands of queries within string literals of a legacy application that I support. I am writing a few expressions to capture the majority of these and hopefully fix them programatically.
Consider the following:
Query query = session
.createSQLQuery("SELECT distinct p.userid, p.name, f.hsid, "
+ "p.vid, p.vname, p.paymentdate, p.amount "
+ "FROM vk.payment p, (select * from vs.fuser) fu, (select * from vs.fac) f "
+ "WHERE p.description = 'Check' AND "
+ "p.paymentdate >= :startDate and p.paymentdate <= :endDate AND "
+ "fu.userid = p.userid AND fu.facid = f.facid "
+ "ORDER BY p.userid");
query.setParameter("startDate", startDate);
query.setParameter("endDate", endDate);
I have the following DOTALL expression to attempt and capture simply the ugly contents of the method argument.
(?s)(?<=\.createSQLQuery\(")(.*)(?="\)\;)
I specify the DOTALL flag with (?s) a non-capturing look behind to get \.createSQLQuery\(", capture everything including line breaks with (.*), and finally a non capturing positive lookahead to stop the capture at "\)\;.
I am expecting to capture the following:
SELECT distinct p.userid, p.name, f.hsid, "
+ "p.vid, p.vname, p.paymentdate, p.amount "
+ "FROM vk.payment p, (select * from vs.fuser) fu, (select * from vs.fac) f "
+ "WHERE p.description = 'Check' AND "
+ "p.paymentdate >= :startDate and p.paymentdate <= :endDate AND "
+ "fu.userid = p.userid AND fu.facid = f.facid "
+ "ORDER BY p.userid
Instead the expression is a lot greedier than I anticipated and is capturing this:
SELECT distinct p.userid, p.name, f.hsid, "
+ "p.vid, p.vname, p.paymentdate, p.amount "
+ "FROM vk.payment p, (select * from vs.fuser) fu, (select * from vs.fac) f "
+ "WHERE p.description = 'Check' AND "
+ "p.paymentdate >= :startDate and p.paymentdate <= :endDate AND "
+ "fu.userid = p.userid AND fu.facid = f.facid "
+ "ORDER BY p.userid");
query.setParameter("startDate", startDate);
query.setParameter("endDate", endDate);
... to EOF
The thing is that without the DOTALL the expression works as expected on a single line:
Query query = session.createSQLQuery("SELECT .... ");
and captures without the remaining characters on the end…
SELECT ....
Is there some aspect of DOTALL that every regex guru seems to know that does not seem to be documented anywhere? Does DOTALL not work with positive lookahead?
I appreciate any help!
Make the
*quantifier non-greedy by adding a?after it, like so:.*?Also why are you even using lookarounds? It can lead to undesired behavior in some cases to use them without thought like this. (And it always irritates me. (-; )
You could just use: