I had a table for stores containing store name and address. After some discussion, we are now normalizing the the table, putting address in separate tables. This is done for two reasons:
- Increase search speed for stores by location / address
- Increase execution time for checking misspelled street names using the Levenshtein algorithm when importing stores.
The new structure looks like this (ignore typos):
country;
+--------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| id | varchar(2) | NO | PRI | NULL | |
| name | varchar(45) | NO | | NULL | |
| prefix | varchar(5) | NO | | NULL | |
+--------------------+--------------+------+-----+---------+-------+
city;
+--------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| city | varchar(50) | NO | | NULL | |
+--------------------+--------------+------+-----+---------+-------+
street;
+--------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| street | varchar(50) | YES | | NULL | |
| fk_cityID | int(11) | NO | | NULL | |
+--------------------+--------------+------+-----+---------+-------+
address;
+--------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| streetNum | varchar(10) | NO | | NULL | |
| street2 | varchar(50) | NO | | NULL | |
| zipcode | varchar(10) | NO | | NULL | |
| fk_streetID | int(11) | NO | | NULL | |
| fk_countryID | int(11) | NO | | NULL | |
+--------------------+--------------+------+-----+---------+-------+
*street2 is for secondary reference or secondary address in e.g. the US.
store;
+--------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| name | varchar(50) | YES | | NULL | |
| street | varchar(50) | YES | | NULL | |
| fk_addressID | int(11) | NO | | NULL | |
+--------------------+--------------+------+-----+---------+-------+
*I've left out address columns in this table to shorten code
The new tables have been populated with correct data and the only thing remaining is to add foreign key address.id in store table.
The following code lists all street names correctly:
select a.id, b.street, a.street2, a.zipcode, c.city, a.fk_countryID
from address a
left join street b on a.fk_streetID = b.id
left join city c on b.fk_cityID = c.id
- How can I update
fk_addressIDinstoretable? - How can I list all stores with correct address?
- Is this bad normalization considering the reasons given above?
UPDATE
It seems like the following code lists all stores with correct address – however it is a bit slow (I have about 2000 stores):
select a.id, a.name, b.id, c.street
from sl_store a, sl_address b, sl_street c
where b.fk_streetID = c.id
and a.street1 = c.street
group by a.name
order by a.id
I’m not going to speak to misspellings. Since you’re importing the data, misspellings are better handled in a staging table.
Let’s look at this slightly simplified version.
Although a lot of people firmly believe that ZIP code determines city and state in the US, that’s not the case. ZIP codes have to do with how carriers drive their routes, not with geography. Some cities straddle the borders between states; single ZIP code routes can cross state lines. Even Wikipedia knows this, although their examples might be out of date. (Delivery routes change constantly.)
So we have a table that has two candidate keys,
It has no non-key attributes. I think this table is in 5NF. What do you think?
If I wanted to increase the data integrity for street names, I might start with something like this.
You could (and probably should) repeat this process for city names, state names (state codes), and country names.
Some problems with your approach
You can apparently enter a street id number for a street that’s in the US, along with the country id for Croatia. (The “full name” of a city, so to speak, is the kind of fact you probably want to store in order to increase data integrity. That’s probably also true of the “full name” of a street.)
Using id numbers for every bit of data greatly increases the number of joins required. Using id numbers doesn’t have anything to do with normalization. Using id numbers without corresponding unique constraints on the natural keys–an utterly commonplace mistake–allows duplicate data.