Find Oracle single line comments except the ones that appear inside a string.
For example:
-- This is a valid single line comment
But
'This is a string -- and it is not a comment';
I am using this regex to find single line comments
--.*$
a few cases can be handled but there are several complex ones. You can use this script for reference
-- this is a single line comment
CREATE OR REPLACE PROCEDURE "MAIL_WITH_ATTACHMENT" ( )
IS
tmp varchar(2) ; -- this is a comment
tmp1 varchar(2) := 'some texxt'; -- this is another comment
tmp2 varchar(3) := 'some more --text'; -- this is one more comment
tmp3 varchar(4) := 'this regex isn't --working properly'; -- Don't you think this is another comment
BEGIN
'--This is a Mime message, which your current mail reader may not' || crlf ||
' some more -- characters in a string';
mesg:= crlf ||
'--This is a Mime message, which your current mail reader may not' || crlf ||
' some more -- characters in a string';
END;
Result must be this
[1] : -- this is a single line comment
[2] : -- this is a comment
[3] : -- this is another comment
[4] : -- this is one more comment
[5] : -- Don't you think this is another comment
Thanks
Personally, I’d use an SQL parser to strip these comments. The problem with regex is that it’s not really aware of its surroundings: regex has a hard time figuring out if a single quote is inside a comment, or if
--is inside a string literal.You can circumvent this by using a regex that matches from the start of a line and match string literals as well. Making it behave more like a lexical analyzer (the first stage of parsing).
Such a regex could look like this:
A quick break down of the regex:
In plain English that would read like: from each start of a line, try to match zero or more:
'(?:''|[^'])*');-that is a part of a comment ((?!--|').).and store this match in group 1. Then match a comment (
--.*$).So now all you need to do is replace this pattern with whatever is matched in group 1. A demo:
which will print:
EDIT
And if you only want to extract the comments, wrap the capture group around
--.*$and use aPattern&Matchertofind()the matches:which will print: