How far does one go to eliminate duplicate data in a database? Because you

Question

0

Asked: June 16, 20262026-06-16T02:04:15+00:00 2026-06-16T02:04:15+00:00

How far does one go to eliminate duplicate data in a database? Because you

0

How far does one go to eliminate duplicate data in a database? Because you could go OTT and it would get crazy. Let me give you an example…

If I were to create a Zoo database which contains a table ‘Animal’ which has a ‘name’, ‘species’ and ‘country_of_birth’

But there will be duplicate data there as many animals could come from same country and there could be lots of tigers, for example.

So really there should be a ‘Species’ table and a ‘Country_of_birth’ table
But then after a while you would have tons of tables
So how far do you go?

In this question I am just using one table as an example. One row in the Animal table stores information about a single animal in the zoo. So that animal’s name, species and country of birth, as well as a unique animalID.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T02:04:17+00:00

But there will be duplicate data there as many animals could come from
same country and there could be lots of tigers, for example.

This suggests you want to keep track of individual animals, not just kinds of animals. Let’s assume that the zoos use some kind of numeric tattoo or microchip to identify individual animals.

Assume this sample data is representative. (It’s not, but it’s ok for teaching.)

Animals
Predicate: Animal having microchip <chip_num> of species <species> 
           has name <name> and was born in <birth_country_code>.

chip_num  name       species          birth_country_code
--
101234    Anita      Panthera tigris  USA
101235    Bella      Panthera tigris  USA
101236    Calla      Panthera tigris  USA
101237    Dingo      Canis lupus      CAN
101238    Exeter     Canis lupus      CAN
101239    Bella      Canis lupus      USA
101240    Bella      Canis lupus      CAN

There’s no redundant data in that table. None of those columns can be dropped without radically changing the meaning of that table. It has a single candidate key: chip_num. It’s in 5NF.

Values are repeated in non-key columns. That’s kind of the definition of non-key (non-prime) columns. Values in key columns (or sets of key columns) are unique; values in non-key columns aren’t.

If you want to restrict the values in “birth_country_code” to the valid three-letter ISO country codes, you can add a table of valid three-letter ISO country codes, and set a foreign key reference to it. This is generally a Good Thing, but it has nothing to do with normalization.

iso_country_code
--
CAN      
USA

You could do the same thing again for “species”. That, too, would generally be a Good Thing, and it, too, would have nothing to do with normalization.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

How far does one go to eliminate duplicate data in a database? Because you

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply