I have strings like the ones below in a SQL column. I want to extract them as a Gigabyte amount in aggregate. Example:
Original Column ---------> Expected Output from a TSQL function
-------------------------------------------
$15 / 1GB 24m + Intern 120MB ----------> 1.12 GB
$19.95 / 500MB + $49.95 / 9GB Blackberry -----> 9.5GB
$174.95 Blackberry 24GB + $10 / 1GB Datapack ----> 25GB
$79 / 6GB --> 6GB
Null --> Null
$20 Plan --> 0GB
Note: for our purpose, 1000MB = 1 GB (not 1024).
The pattern is numbers followed by GB/MB, usually they are combined like 1GB (without any space but may sometimes may contain a space, it is not particularly important if hard to implement for this exception).
Sometimes there are up to three or four instances of GB/MB occurring in the same string which are usually separated by a + sign (see row 2 and 3 of my example above).
I have seen how we extract the dollar values in one of the answers where numbers were followed by $ or extract all integers in a string but I don’t want to extract the dollar values or all the integers in a string. I just want the sum of GB/MB in the string.
The following may appear somewhat specific and too assuming, even though it might also look a bit too complicated for a specific and over-assuming solution. Still, I hope it will at least make a good starting point.
These are the assumptions I had to make to avoid complicating the script even further:
The values to be extracted never contain a decimal point (are integers).
The values to be extracted are always either preceded by a space or at the beginning of the column value.
Neither
GBnorMBcan possibly be part of anything else than a traffic size (a value to be extracted).Neither
GBnorMBis ever preceded by a space.All the strings are either unique or accompanied by another column or columns that can be used as key values. (My solution, in particular, uses an additional column as a key.)
So, here’s my attempt (which did return the expected results for all the sample data provided in the original post):
Basically, the idea is first to convert all gigabytes to megabytes, to then be able search and extract only megabyte amounts. The search & extract method involves a recursive CTE and consists essentially of these steps:
1) find the position of the first
MB;2) find the length of the number immediately before the
MB;3) cut off the beginning of the string right at the end of the first
MB;4) repeat from Step 1 until no
MBis found;5) join the found figures to the original string list to extract the amounts themselves.
Afterwards, it only remains for us to group by key values and sum the obtained amounts. Here’s the output: