I have a table. A large table with 25 columns each containing atomic data about specific entity. Entities are, to be specific, real estate properties (like rooms and houses) for sale, and so the table is called property.
Every property has a subclass (actually it’s called “type”, but we’ll call it “subclass” to avoid confusion with data types), which at this moment is “already built and for sale” or “under construction, but can be invested in”. It also has a lot of attributes, like address, price, etc., most of which are shared among subclasses, but some are not. Attributes have different data types, which are:
- Integer numbers
- Floating point numbers
- Short lines of text
- Long chunks of text
- Foreign keys to other tables
These “other tables” are for selecting from a moderator-editable list of options (like list of the city districts, list of building companies etc.).
Moderator should be able to create new properties and edit them. User should be able to view certain property’s detailed information and search for properties satisfying a certain criteria, then viewing them as a table, sortable by one of the columns.
Depending on the subclass of property, only a subset of property attributes is available to a user for viewing and to a moderator for editing. Also, depending on data type, different HTML code is needed to show these attributes to the user and to provide the moderator with editing controls, as well as different data validation checks should be performed after editing.
The list of fields is not dynamic – it is unlikely for the list of columns and how they are displayed to change often, and it is not needed for the moderator to be able to change it.
However, since 25 is a rather large number, I’d like to organize and keep in one place all the metadata about the property table: the information about what subclasses are the columns for and how should the data be displayed, edited and validated. It would be nice to be able to access all this metadata from my code in some easy way (like an array). I see three options to do this:
1. Constant PHP array
Just create a PHP file or function that will construct the array with metadata, then include/call it when needed.
Pros:
- Simple.
- Fast.
Cons:
- Harder to maintain, because of overly verbose and ugly PHP code.
2. MySQL database
Create a table property_meta in the database and store the metadata there. The new table will contain column name from the property table, relevance of the data in this column to each of the property subclasses, expected data type etc. Then create a function that will query the necessary fields and return the resulting data as an array.
Pros:
- Easier to change the metadata.
- Less code to maintain.
- It can be later expanded to allow user to change the list of columns. It won’t be much of a problem to add or remove columns from the property table. Although in my opinion user being able to change database schema on the fly is a serious code smell. Correct me if I’m wrong.
Cons:
- Whenever metadata is changed, server database must be updated accordingly. But it will only happen when database schema changes anyway, so no one cares.
- Slower – to the cost of creating array this will add the cost of talking with the server and the cost of selecting the data from the database. Although the latter is likely negated by MySQL query caching mechanism.
3. Separate properties and their attributes into different tables
Create the metadata table as in the above solution, only name it property_attribute. Also create property_data table with foreign keys to property and property_attribute, and one more column for attribute value. The property table would then only contain primary key and subclass and actual attribute values can only be retrieved with a query with two joins.
Pros:
- Most flexible solution. If the list of attributes is changed, database schema will stay the same.
Cons:
- Each property_data row will contain a single piece of data of unknown type. Either store all of them as TEXT or BLOB, or create separate columns for separate data types. Both solutions look ugly.
- It is unclear how to handle former foreign keys from the property table. Automated data integrity checks that are done on every insert become nigh impossible (maybe possible with triggers? I’m not sure).
- Selecting data will become much harder. The data will be fetched as
property_id–property_attribute_id–valuetrinities, which is not intuitive and requires more effort to properly output. - More than that, filtering and sorting by one or more of the attributes will send me to the world of hurt.
- Very, very slow.
- It feels like using a helicopter to cross the street.
Frankly, I don’t like either of these solutions. But the second one is the least ugly, in my opinion. What do you think?
My first question to you would be: Are you asking for help devising a proper database schema or are you asking how to handle these properties/subclasses in code?
Database Schema
Database schema isn’t exactly my forte, so I’ll leave that up to somebody who may know better than me.
I would probably just have each field as its own column in a single properties table though because its easy and lets you index each field properly. As you said, new fields are not going to be added often.
Handling it in PHP
Thinking of these fields as metadata is approaching it wrong, in my opinion. Each subclass has its own set of fields, meaning technically they are all separate kinds of entities.
I’m keeping this simple for clarity, but here’s something along the lines of what I would do:
Make a POPO (plain ole’ php object) for each property type.
These are simply value objects, similar to what you may find in an ORM. Like a Doctrine2 entity, they do not perform any kind of persistence.
Here’s where I’m simplifying the example.. this most definitely does not follow the SRP and is bad design i.m.o, but I’m writing it this way for brevity.
Make a factory class that’s responsible for fetching and saving the data from/to the database, instatiating the appropriate POPOs and populating the data accordingly.
That’s really all there is to it. It’s a mini-ORM. You could actually do away with #2 and use a proper ORM if you wanted…
They key is having the separate objects for each property subclass. It’s advantageous because: