Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 740421
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T08:30:50+00:00 2026-05-14T08:30:50+00:00

After reading the tips from this great Nettuts+ article I’ve come up with a

  • 0

After reading the tips from this great Nettuts+ article I’ve come up with a table schema that would separate highly volatile data from other tables subjected to heavy reads and at the same time lower the number of tables needed in the whole database schema, however I’m not sure if this is a good idea since it doesn’t follow the rules of normalization and I would like to hear your advice, here is the general idea:


I’ve four types of users modeled in a Class Table Inheritance structure, in the main “user” table I store data common to all the users (id, username, password, several flags, …) along with some TIMESTAMP fields (date_created, date_updated, date_activated, date_lastLogin, …).

To quote the tip #16 from the Nettuts+ article mentioned above:

Example 2: You have a “last_login”
field in your table. It updates every
time a user logs in to the website.
But every update on a table causes the
query cache for that table to be
flushed. You can put that field into
another table to keep updates to your
users table to a minimum.

Now it gets even trickier, I need to keep track of some user statistics like

  • how many unique times a user profile was seen
  • how many unique times a ad from a specific type of user was clicked
  • how many unique times a post from a specific type of user was seen
  • and so on…

In my fully normalized database this adds up to about 8 to 10 additional tables, it’s not a lot but I would like to keep things simple if I could, so I’ve come up with the following “events” table:

|------|----------------|----------------|---------------------|-----------|
| ID   | TABLE          | EVENT          | DATE                | IP        | 
|------|----------------|----------------|---------------------|-----------|
| 1    | user           | login          | 2010-04-19 00:30:00 | 127.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 1    | user           | login          | 2010-04-19 02:30:00 | 127.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 2    | user           | created        | 2010-04-19 00:31:00 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 2    | user           | activated      | 2010-04-19 02:34:00 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 2    | user           | approved       | 2010-04-19 09:30:00 | 217.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 2    | user           | login          | 2010-04-19 12:00:00 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15   | user_ads       | created        | 2010-04-19 12:30:00 | 127.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 15   | user_ads       | impressed      | 2010-04-19 12:31:00 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15   | user_ads       | clicked        | 2010-04-19 12:31:01 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15   | user_ads       | clicked        | 2010-04-19 12:31:02 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15   | user_ads       | clicked        | 2010-04-19 12:31:03 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15   | user_ads       | clicked        | 2010-04-19 12:31:04 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15   | user_ads       | clicked        | 2010-04-19 12:31:05 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 2    | user           | blocked        | 2010-04-20 03:19:00 | 217.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 2    | user           | deleted        | 2010-04-20 03:20:00 | 217.0.0.1 |
|------|----------------|----------------|---------------------|-----------|

Basically the ID refers to the primary key (id) field in the TABLE table, I believe the rest should be pretty straightforward. One thing that I’ve come to like in this design is that I can keep track of all the user logins instead of just the last one, and thus generate some interesting metrics with that data.

Due to the growing nature of the events table I also thought of making some optimizations, such as:

  • #9: Since there is only a finite number of tables and a finite (and predetermined) number of events, the TABLE and EVENTS columns could be setup as ENUMs instead of VARCHARs to save some space.
  • #14: Store IPs as UNSIGNED INTs with INET_ATON() instead of VARCHARs.
  • Store DATEs as TIMESTAMPs instead of DATETIMEs.
  • Use the ARCHIVE (or the CSV?) engine instead of InnoDB / MyISAM.
    • Only INSERTs and SELECTs are supported, and data is compressed on the fly.

Overall, each event would only consume 14 (uncompressed) bytes which is okay for my traffic I guess.

Pros:

  • Ability to store more detailed data (such as logins).
  • No need to design (and code for) almost a dozen additional tables (dates and statistics).
  • Reduces a few columns per table and keeps volatile data separated.

Cons:

  • Non-relational (still not as bad as EAV):
    • SELECT * FROM events WHERE id = 2 AND table = 'user' ORDER BY date DESC();
  • 6 bytes overhead per event (ID, TABLE and EVENT).

I’m more inclined to go with this approach since the pros seem to far outweigh the cons, but I’m still a little bit reluctant… Am I missing something? What are your thoughts on this?

Thanks!


@coolgeek:

One thing that I do slightly
differently is to maintain an
entity_type table, and use its ID in
the object_type column (in your case,
the ‘TABLE’ column). You would want to
do the same thing with an event_type
table.

Just to be clear, you mean I should add an additional table that maps which events are allowed in a table and use the PK of that table in the events table instead of having a TABLE / EVENT pair?


@ben:

These are all statistics derived from
existing data, aren’t they?

The additional tables are mostly related to statistics but I the data doesn’t already exists, some examples:

user_ad_stats                          user_post_stats
-------------                          ---------------
user_ad_id (FK)                        user_post_id (FK)
ip                                     ip
date                                   date
type (impressed, clicked)

If I drop these tables I’ve no way to keep track of who, what or when, not sure how views can help here.

I agree that it ought to be separate,
but more because it’s fundamentally
different data. What someone is and
what someone does are two different
things. I don’t think volatility is so
important.

I’ve heard it both ways and I couldn’t find anything in the MySQL manual that states that either one is right. Anyway, I agree with you that they should be separated tables because they represent kinds of data (with the added benefit of being more descriptive than a regular approach).

I think you’re missing the forest for
the trees, so to speak.

The predicate for your table would be
“User ID from IP IP at time DATE
EVENTed to TABLE” which seems
reasonable, but there are issues.

What I meant for “not as bad as EAV” is that all records follow a linear structure and they are pretty easy to query, there is no hierarchical structure so all queries can be done with a simple SELECT.

Regarding your second statement, I think you understood me wrong here; the IP address is not necessarily associated with the user. The table structure should read something like this:

IP address (IP) did something
(EVENT) to the PK (ID) of the
table (TABLE) on date (DATE).

For instance, in the last row of my example above it should read that IP 217.0.0.1 (some admin), deleted the user #2 (whose last known IP is 127.0.0.2) at 2010-04-20 03:20:00.

You can still join, say, user events
to users, but you can’t implement a
foreign key constraint.

Indeed, that’s my main concern. However I’m not totally sure what can go wrong with this design that couldn’t go wrong with a traditional relational design. I can spot some caveats but as long as the app messing with the database knows what it is doing I guess there shouldn’t be any problems.

One other thing that counts in this argument is that I will be storing much more events, and each event will more than double compared to the original design, it makes perfect sense to use the ARCHIVE storage engine here, the only thing is it doesn’t support FKs (neither UPDATEs or DELETEs).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T08:30:50+00:00Added an answer on May 14, 2026 at 8:30 am

    I highly recommend this approach. Since you’re presumably using the same database for OLTP and OLAP, you can gain significant performance benefits by adding in some stars and snowflakes.

    I have a social networking app that is currently at 65 tables. I maintain a single table to track object (blog/post, forum/thread, gallery/album/image, etc) views, another for object recommends, and a third table to summarize insert/update activity in a dozen other tables.

    One thing that I do slightly differently is to maintain an entity_type table, and use its ID in the object_type column (in your case, the ‘TABLE’ column). You would want to do the same thing with an event_type table.

    Clarifying for Alix – Yes, you maintain a reference table for objects, and a reference table for events (these would be your dimension tables). Your fact table would have the following fields:

    id
    object_id
    event_id
    event_time
    ip_address
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

After reading this answer: best way to pick a random subset from a collection?
After reading this question , I was reminded of when I was taught Java
After reading this description of late static binding (LSB) I see pretty clearly what
After reading the Head First Design Patterns book and using a number of other
After reading a bit more about how Gnutella and other P2P networks function, I
After reading Practical Common Lisp I finally understood what the big deal about macros
After reading Evan's and Nilsson's books I am still not sure how to manage
After reading What’s your/a good limit for cyclomatic complexity? , I realize many of
After reading the answers to the question Calculate Code Metrics I installed the tool
After reading the Test-and-Set Wikipedia entry , I am still left with the question

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.