This question may be extremely naive, in which case, I apologize. I’m trying to learn more about database administration and I’m uncertain which choice is preferable in this case. I have a model which could easily be split up into two tables. It contains contact info and profile info for companies.
class Company(models.Model):
name=models.CharField(max_length=100)
street_address=models.CharField(max_length=100, blank=True)
city=models.CharField(max_length=100, blank=True)
state=models.CharField(max_length=100, blank=True)
zipcode=models.IntegerField(max_length=5, blank=True)
input_level=models.CharField(choices=((0,'Less',),(1,'More'))
expense_min=models.IntegerField(blank=True)
expense_max=models.IntegerField(blank=True)
health_value=models.IntegerField(choices=[(i+1,i+1) for i in range(5)], blank=True)
group_size=models.IntegerField(blank=True)
comment=models.TextField(max_length=500, blank=True)
created=models.DateField(auto_now_add=True)
registered=models.BooleanField(default=False)
Though there are a decent number of columns, I don’t see any explicit reasons to break this into related tables. The profile related info (below zipcode) may change often, though the address related info will probably stay the same. I’d assume that the cost of joins would outweigh the cost of updating/inserting into a table with many rows.
Is there a basic rule here or do I have to just profile it?
There are no right or wrong answers in regards to normalization and denormalization of schemas.
You should ask yourself, is performance an important criteria? If so then incur the costs of program complexity and go with a denormalized table.
If the tables are small and performance is not a big issue, then don’t trouble yourself with program complexity. Forgetting to update a column in another table would cause you a lot of problems.
Also don’t forget that indexes often cannot be used with joins.