I am implementing a Twitter app, where the app needs to cache details of users coming from Twitter. Would it be wise to serialize the data into a column called data? Or, in my User model, should I create a column for each field that is returned by the API request?
Reading about (ActiveRecord::Base) serialize.
If I go with the latter approach, I would end up with a lot of fields in my User model and if the Twitter API decides to add or remove fields in the future, then we would have to update our columns in our db respectively.
However, one advantage I can think of with this approach, is if each column was stored in the db. I could say, search for all twitter users based on location. I could also index the location column for faster queries. How does this compare to the serialized approach?
One will suggest: “Don’t search serialized data, just don’t do it”.
So I guess, I can have two columns: data (for serialized data), and location, no?
But lets add a few more twists:
- The app needs to sort users out by registered date. Not with our app, but with Twitter.
- The app should be able to search users by Twitter screen name or Twitter id.
- The app should be able to sort users out by followers, friends and statuses count.
Does this mean, I would need 8 columns in my db: data, location, twitter_created_at, twitter_screen_name, twitter_id, followers_count, friends_count and statuses_count? At this point, would it still be better to go for a mix-column-type approach, or just single each field to its own column.
Would you save the data returned from the API into a single column: data, or save each field in its own respective column, or have a mix of both (as described above)?
Your thoughts will be appreciated.
So assume for a moment that you have a table with the following three columns:
In this table you can add a row for each api field you want to persist. For example:
This means that user number 1 has these two custom parameters that are tied to the api…if the api changes later on and “swallow_type” is removed, you can get rid of that row. New api fields can be added on the fly.
This is an easy way to handle custom parameters that can and do change periodically. It saves you from having to restructure a table each time the api changes.
This is the point where I duck to avoid the incoming flak from DB purists…