I’m trying to extract the # of minutes from a text field using Oracle’s REGEXP_SUBSTR() function.
Data:
Treatment of PC7, PT1 on left. 15 min.
15 minutes.
15 minutes
15 mins.
15 mins
15 min.
15 min
15min
15
In each case, I’m hoping to extract the ’15’ part of the string.
Attempts:
\d+gets all of the numeric values, including the ‘7’ and ‘1’, which is undesirable.(\d)+(?=\ ?min)get the ’15’ from all rows except the last.(?((\d)+(?=\ ?min))((\d)+(?=\ ?min))|\d+), a conditional statement, doesnt’ match anything.
What is wrong with my conditional statement?
** EDIT **
WITH DATA AS (
SELECT 'Treatment of PC7, PT1 on left. 15 min.' COMMENTS FROM DUAL
UNION ALL
SELECT '15 minutes.' COMMENTS FROM DUAL
UNION ALL
SELECT '15 minutes' COMMENTS FROM DUAL
UNION ALL
SELECT '15 mins.' COMMENTS FROM DUAL
UNION ALL
SELECT '15 mins' COMMENTS FROM DUAL
UNION ALL
SELECT '15 min.' COMMENTS FROM DUAL
UNION ALL
SELECT '15 min' COMMENTS FROM DUAL
UNION ALL
SELECT '15min' COMMENTS FROM DUAL
UNION ALL
SELECT '15' COMMENTS FROM DUAL
)
SELECT COMMENTS,
REGEXP_SUBSTR(COMMENTS, '(\d+)\s?(?:min.*)?$', 1, 1) A,
REGEXP_SUBSTR(COMMENTS, '\d+?(?= ?min)|^\d+$', 1, 1) B,
REGEXP_SUBSTR(COMMENTS, '\d+?(?: ?min)|^\d+$', 1, 1) C
FROM DATA
Results (there must be a better way to format columns than as ‘code sample’):
COMMENTS A B C
Treatment of PC7, PT1 on left. 15 min.
15 minutes.
15 minutes
15 mins.
15 mins
15 min.
15 min
15min
15 15 15 15
This Regex will work for you.
Explanation
^.*?– matches the beginning of the string, followed by any character 0 or more times(\d+)– matches at least one digit and stores it in backreference position 1( ?min.*$)– matches a space (maybe),min, any character (maybe), then the end of the string.(...|$)– if it can’t findmin, it will see if there is the end of the string instead.Then instead of using
REGEXP_SUBSTR(), useREGEXP_REPLACE()like this, replacing the entire string with what was stored in backreference position 1 (your number):