I’m having difficulty translating what I want into functional programming, since I think imperatively. Basically, I have a table of forms, and a table of expectations. In the Expectation view, I want it to look through the forms table and tell me if each one found a match. However, when I try to use joins to accomplish this, the joins are adding rows to the Expectation table when two or more forms match. I do not want this.
In an imperative fashion, I want the equivalent of this:
ForEach (row in Expectation table)
{
if (any form in the Form table matches the criteria)
{
MatchID = form.ID;
SignDate = form.SignDate;
...
}
}
What I have in SQL is this:
SELECT
e.*, match.ID, match.SignDate, ...
FROM
POFDExpectation e LEFT OUTER JOIN
(SELECT MIN(ID) as MatchID, MIN(SignDate) as MatchSignDate,
COUNT(*) as MatchCount, ...
FROM Form f
GROUP BY (matching criteria columns)
) match
ON (form.[match criteria] = expectation.[match criteria])
Which works okay, but very slowly, and every time there are TWO matches, a row is added to the Expectation results. Mathematically I understand that a join is a cross multiply and this is expected, but I’m unsure how to do this without them. Subquery perhaps?
I’m not able to give too many further details about the implementation, but I’ll be happy to try any suggestion and respond with the results. I have 880 Expectation rows, and 942 results being returned. If I only allow results that match one form, I get 831 results. Neither are desirable, so if yours gets me to exactly 880, yours is the accepted answer.
Edit: I am using SQL Server 2008 R2, though a generic solution would be best.
Sample code:
--DROP VIEW ExpectationView; DROP TABLE Forms; DROP TABLE Expectations;
--Create Tables and View
CREATE TABLE Forms (ID int IDENTITY(1,1) PRIMARY KEY, ReportYear int, Name varchar(100), Complete bit, SignDate datetime)
GO
CREATE TABLE Expectations (ID int IDENTITY(1,1) PRIMARY KEY, ReportYear int, Name varchar(100))
GO
CREATE VIEW ExpectationView AS select e.*, filed.MatchID, filed.SignDate, ISNULL(filed.FiledCount, 0) as FiledCount, ISNULL(name.NameCount, 0) as NameCount from Expectations e LEFT OUTER JOIN
(select MIN(ID) as MatchID, ReportYear, Name, Complete, Min(SignDate) as SignDate, COUNT(*) as FiledCount from Forms f GROUP BY ReportYear, Name, Complete) filed
on filed.ReportYear = e.ReportYear AND filed.Name like '%'+e.Name+'%' AND filed.Complete = 1 LEFT OUTER JOIN
(select MIN(ID) as MatchID, ReportYear, Name, COUNT(*) as NameCount from Forms f GROUP BY ReportYear, Name) name
on name.ReportYear = e.ReportYear AND name.Name like '%'+e.Name+'%'
GO
--Insert Text Data
INSERT INTO Forms (ReportYear, Name, Complete, SignDate)
SELECT 2011, 'Bob Smith', 1, '2012-03-01' UNION ALL
SELECT 2011, 'Bob Jones', 1, '2012-10-04' UNION ALL
SELECT 2011, 'Bob', 1, '2012-07-20'
GO
INSERT INTO Expectations (ReportYear, Name)
SELECT 2011, 'Bob'
GO
SELECT * FROM ExpectationView --Should only return 1 result, returns 9
The ‘filed’ shows that they have completed a form, ‘name’ shows that they may have started one but not finished it. My view has four different ‘match criteria’ – each a little more strict, and counts each. ‘Name Only Matches’, ‘Loose Matches’, ‘Matches’ (default), ‘Tight Matches’ (used if there are more than one default match.
This is how I do it when I want to keep to a JOIN-type query format:
It usually performs better as well.
Semantically, {CROSS|OUTER} APPLY (table-expression) specifies a table-expression that is called once for each row in the preceding table expressions of the FROM clause and then joined to them. Pragmatically, however, the compiler treats it almost identically to a JOIN.
The practical difference is that unlike a JOIN table-expression, the APPLY table-expression is dynamically re-evaluated for each row. So instead of an ON clause, it relies on its own logic and WHERE clauses to limit/match its rows to the preceding table-expressions. This also allows it to make reference to the column-values of the preceding table-expressions, inside its own internal subquery expression. (This is not possible in a JOIN)
The reason that we want this here, instead of a JOIN, is that we need a TOP 1 in the sub-query to limit its returned rows, however, that means that we need to move the ON clause conditions to the internal WHERE clause so that it will get applied before the TOP 1 is evaluated. And that means that we need an APPLY here, instead of the more usual JOIN.