Summary
Okay, back in 2007 I’ve been asked to produce a tiny piece of software which objective was to input persons names, address and phone number according to the local phone directory.
At this time, the only requirement was to be able to group this list by street name. Hence, the street name in the address was sufficient.
It’s been the third year now that once a year I’m having this headache of reuniting the street names and sectors together. A sector in nothing more than “downtown”, “upper city”, “125e”, “over 125e” and “unknown” for streets I can’t classify.
Data sample and structure
I have an initial table which was created the first time the software was delivered. I will make it SQL Server, as I imported the data into it for ease of work.
CREATE TABLE Contacts (
ContactId int not null identity(1, 1) primary key
, lastname nvarchar(50) not null
, firstname nvarchar(20) not null
, civic nvarchar(10) not null
, street nvarchar(20) not null
, city nvarchar(20) not null
, phone bigint not null
)
-- With the following sample data:
insert into Contacts (lastname, firstname, civic, street, city, phone)
values (N'LNAME-5551231234', N'A', N'89', N'MY STREET', N'SHAWINIGAN', 5551231234)
GO
insert into Contacts (lastname, firstname, civic, street, city, phone)
values (N'LNAME-5559879876', N'FNAME', N'10', N'YOUR STREET', N'SHAWINIGAN', 5559879876)
GO
insert into Constacts (lastname, firstname, civic, street, city, phone)
values (N'LNAME-5554564567', N'AFNAME', N'25', N'HIS STREET', N'SHAWINIGAN-SUD', 5554564567)
GO
Then, I added tables with the street names correctly orthographed, and another for the different sectors.
-- Sectors
CREATE TABLE Sectors (
sectorId int not null identity(1, 1) primary key
, sectorName nvarchar(20) not null
)
GO
insert into Sectors (sectorName)
values (N'Downtown')
GO
insert into Sectors (sectorName)
values (N'Upper city')
GO
-- Streets
CREATE TABLE Streets (
streetId int not null identity(1, 1) primary key
, sectorId int not null references Sectors (sectorId)
, streetName nvarchar(20) not null
)
GO
insert into Streets (sectorId, streetName)
values (1, N'My St.')
GO
insert into Streets (sectorId, streetName)
values(1, N'Ur Street')
GO
insert into Streets (sectorId, streetName)
values (2, N'HIS STREET')
GO
Which would result, for the benefit of the explanation:
Sectors
sectorId | sectorName
---------------------
1 | Downtown
2 | Upper city
Streets
streetId | sectorId | streetName
--------------------------------
1 | 1 | My St.
2 | 1 | Ur Street
3 | 2 | HIS STREET
Contacts
contactId | lastname | firstname | civic | street | city | phone
--------------------------------------------------------------------------------------------
1 | LNAME-5551231234 | A | 89 | My Street | SHAWINIGAN | 5551231234
2 | LNAME-5559879876 | FNAME | 10 | Your Street | SHAWINIGAN | 5559879876
3 | LNAME-5554564567 | AFNAME | 25 | HIS STREET | SHAWINIGAN-SUD | 5554564567
Objective
I got to resolve the street name conflict due to the orthograph. First, it seems that the field Contacts.street holds one value which exists in Streets.streetName. Thus, when I comparing with an equal (=) sign, I get about 6,000 rows only, when the population of the city is about 13,000 people.
Because of this, I try to join the tables with a like clause, but then I can gather about 20,000 rows, with duplicates lastname, name, civic, phone information combination from Contacts.
In addition to it, I seem to lack of precision or don’t know how to say, but when I’m using a like, I get some strange results.
The results obtained are, for instance, let’s consider I have the street 125e Rue in Streets, and having 12e Rue, 25e Rue in Contacts, then it looks like the Contact is duplicated because both streets meet the like pattern. (It would be so much easier with production data to understand, but these are people addresses and phone number, so I can’t…)
Queries tempted so far
This query produces the kind of above-mentioned duplicates information, but only duplicate information from Contacts, as the Streets.streetName change from a record to another in the scope of this query. Besides, this query makes look the information like if there were multiple addresses for A LASTNAME-5551231234, for instance.
select c.city
, s.sectorName
, st.streetName
, c.civic
, c.lastname
, c.firstname
, c.phone
from Contacts c
inner join Streets st on st.streetName like N'%' + c.street + N'%'
inner join Sectors s on s.sectorId = st.sectorId
group by c.city
, s.sectorName
, st.streetName
, c.civic
, c.lastname
, c.firstname
, c.phone
order by c.city
, s.sectorName
, st.streetName
, c.civic
, c.lastname
Another query, from which I would have liked to inspire myself of since it looks like it produces the right results, when we remove as much information as possible from the Contacts table.
Finally, I’m pretty confused myself, and I don’t expect one of you, professional developers and DBA, can help me with one simple answer, but with a walkthrough and an empirical approach, so I’m willing to try anything you may tink of that I have not already thought of.
Thanks for any help you provide. =)
surely it can’t be this simple…
select c.city , s.sectorName , st.streetName , c.civic , c.lastname , c.firstname , c.phone from Contacts c OUTER JOIN Streets st on st.streetName = c.street inner join Sectors s on s.sectorId = st.sectorId group by c.city , s.sectorName , st.streetName , c.civic , c.lastname , c.firstname , c.phone order by c.city , s.sectorName , st.streetName , c.civic , c.lastname