I just released some code into production that is randomly causing errors. I already fixed the problem by totally changing the way I was doing the query. However, it still bothers me that I don’t know what was causing the problem in the first place so was wondering if someone might know the answer. I have the following query inside of a stored procedure. I’m not looking for comments about that’s not a good practice to make queries with nested function calls and things like that :-). Just really want to find out why it doesn’t work consistently. Randomly the function in the query will return a non-numeric value and cause an error on the join. However, if I immediately rerun the query it works fine.
SELECT cscsf.cloud_server_current_software_firewall_id,
dbo.fn_GetCustomerFriendlyFromRuleName(cscsf.rule_name, np.policy_name) as rule_name,
cscsf.rule_action,
cscsf.rule_direction,
cscsf.source_address,
cscsf.source_mask,
cscsf.destination_address,
cscsf.destination_mask,
cscsf.protocol,
cscsf.port_or_port_range,
cscsf.created_date_utc,
cscsf.created_by
FROM CLOUD_SERVER_CURRENT_SOFTWARE_FIREWALL cscsf
LEFT JOIN CLOUD_SERVER cs
ON cscsf.cloud_server_id = cs.cloud_server_id
LEFT JOIN CLOUD_ACCOUNT cla
ON cs.cloud_account_id = cla.cloud_account_id
LEFT JOIN CONFIGURATION co
ON cla.configuration_id = co.configuration_id
LEFT JOIN DEDICATED_ACCOUNT da
ON co.dedicated_account_id = da.dedicated_account_id
LEFT JOIN CORE_ACCOUNT ca
ON da.core_account_number = ca.core_account_id
LEFT JOIN NETWORK_POLICY np
ON np.network_policy_id = (select dbo.fn_GetIDFromRuleName(cscsf.rule_name))
WHERE cs.cloud_server_id = @cloud_server_id
AND cs.current_software_firewall_confg_guid = cscsf.config_guid
AND ca.core_account_id IS NOT NULL
ORDER BY cscsf.rule_direction, cscsf.cloud_server_current_software_firewall_id
if you notice the join
ON np.network_policy_id = (select dbo.fn_GetIDFromRuleName(cscsf.rule_name))
calls a function.
Here is that function:
ALTER FUNCTION [dbo].[fn_GetIDFromRuleName]
(
@rule_name varchar(100)
)
RETURNS varchar(12)
AS
BEGIN
DECLARE @value varchar(12)
SET @value = dbo.fn_SplitGetNthRow(@rule_name, '-', 2)
SET @value = dbo.fn_SplitGetNthRow(@value, '_', 2)
SET @value = dbo.fn_SplitGetNthRow(@value, '-', 1)
RETURN @value
END
Which then calls this function:
ALTER FUNCTION [dbo].[fn_SplitGetNthRow]
(
@sInputList varchar(MAX),
@sDelimiter varchar(10) = ',',
@sRowNumber int = 1
)
RETURNS varchar(MAX)
AS
BEGIN
DECLARE @value varchar(MAX)
SELECT @value = data_split.item
FROM
(
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as row_num FROM dbo.fn_Split(@sInputList, @sDelimiter)
) AS data_split
WHERE
data_split.row_num = @sRowNumber
IF @value IS NULL
SET @value = ''
RETURN @value
END
which finally calls this function:
ALTER FUNCTION [dbo].[fn_Split] (
@sInputList VARCHAR(MAX),
@sDelimiter VARCHAR(10) = ','
) RETURNS @List TABLE (item VARCHAR(MAX))
BEGIN
DECLARE @sItem VARCHAR(MAX)
WHILE CHARINDEX(@sDelimiter,@sInputList,0) <> 0
BEGIN
SELECT @sItem=RTRIM(LTRIM(SUBSTRING(@sInputList,1,CHARINDEX(@sDelimiter,@sInputList,0)-1))), @sInputList=RTRIM(LTRIM(SUBSTRING(@sInputList,CHARINDEX(@sDelimiter,@sInputList,0)+LEN(@sDelimiter),LEN(@sInputList))))
IF LEN(@sItem) > 0
INSERT INTO @List SELECT @sItem
END
IF LEN(@sInputList) > 0
INSERT INTO @List SELECT @sInputList -- Put the last item in
RETURN
END
The reason it is “randomly” returning different things has to do with how SQL Server optimizes queries, and where they get short-circuited.
One way to fix the problem is the change the return value of fn_GetIDFromRuleName:
Or, change the join condition:
The underlying problem is order of evaluation. The reason the “case” statement fixes the problem is because it checks for a numeric value before it converts and SQL Server guarantees the order of evaluation in a case statement. As a note, you could still have problems with converting numbers like “6e07” or “1.23” which are numeric, but not integers.
Why does it work sometimes? Well, clearly the query execution plan is changing, either statically or dynamically. The failing case is probably on a row that is excluded by the WHERE condition. Why does it try to do the conversion? The question is where the conversion happens.
WHere the conversion happens depends on the query plan. This may, in turn, depend on when the table cscf in question is read. If it is already in member, then it might be read and attempted to be converted as a first step in the query. Then you would get the error. In another scenario, the another table might be filtererd, and the rows removed before they are converted.
In any case, my advice is: