Firstly, this DB question could be a bit DB agnostic, but I am using Sql Server 2008 if that has a specialised solution for this problem, but please keep reading this if you’re not an MS Sql Server person .. please 🙂
Ok, I read in a log file that contains data for a game. Eg. when a player connects, disconnects, does stuff, etc. Nothing too hard. Works great.
Now, two of the log file entry types are
- NewConnection
- LostConnection
What I’m trying to keep track of are the currently connected players, to the game.
So what I originally thought of was to create a second table where it contains the most recent new connection, per player. When a player disconnects/loses connection i then remove this entry from this second table.
Eg.
Table 1: LogEntries
LogEntryId INT PK NOT NULL
EntryTypeId TINYINT NOT NULL
PlayerId INT NOT NULL
....
Table 2: ConnectedPlayers
LogEntryId INT FK (back to LogEntries table) NOT NULL
Then, I thought I could use a trigger to insert this into the cache data into the ConnectedPlayers table. Don’t forget, if it’s a trigger, it needs to handle multiple records, updates and deletes.
But I’m not sure if this is the best way. Like, could I have an Indexed View?
I would love to know people’s thoughts on this.
Oh, one more thing: for simplicity, lets just assume that when a player drops connection/lags out/modem dies/etc, the application is smart enough to know this and DOES record this as a LostConnection entry. There will be no phantom users reported as connected, when they have really got disconnected accidently, etc.
UPDATE:
I was thinking that maybe I could use a view instead? (and i can index this view if i want to, also 🙂 ) By Partitioning my results, I could get the most recent event type, per player .. where the event is a NewConnection or a LostConnection. Then only grab those most recent NewConnection .. which means they are connected. No second table/triggers/extra insert .NET code/whatever needed …
eg..
SELECT LogEntryId, EntryTypeId, PlayerId
FROM
(SELECT LogEntryId, EntryTypeId, PlayerId
RANK() OVER (PARTITION BY PlayerId ORDER BY LogEntryId DESC) AS MostRecentRank
FROM LogEntries
WHERE (EntryTypeId = 2 -- NewConnection
OR EntryTypeId = 4 -- LostConnection)
) SubQuery
WHERE MostRecentRank = 1
How does that sound/look?
You don’t need a second table, but you do need a date column, which I assume is part of your log data. I would normalize the data and avoid the temptation to optimize prematurely. Make sure you index the key columns, mainly the
LogEntryDateand PlayerId columns in the case of your query.Then, use a standard aggregate query to determine the newest log entry for each user, and then filter out the ones that are not connected. You could further optimize this by only selecting from log entries from the last 24 hours (or last week or whatever makes sense for your app).
If you find that you are still not getting the speed you want out of the query, then look at strategies for optimizing. You seem reluctant to cache in the application layer, so your proposal of indexed views would work. You could use the query above as a basis for this to create a Player view that includes a boolean
IsConnectedcolumn.Note: if you do not receive a date with each log entry but the LogEntryId is generated by the game, that should work as a substitute for the date. If you are generating the LogEntryId on insert though, I would caution against relying on that as it would only take one out of order import to throw off all of your data.