Suppose I have some application A with a database. Now I want to add another application B, which should keep track of the database changes of application A. Application B should do some calculations, when data has changed. There is no direct communication between both applications. Both can only see the database.
The basic problem is: Some data changes in the database. How can I trigger some C# code doing some work upon these changes?
To give some stimulus for answers, I mention some approaches, which I am currently considering:
- Make application B polling for changes in the tables of interest. Advantage: Simple approach. Disadvantage: Lots of traffic, especially when many tables are involved.
- Introduce triggers, which will fire on certain events. When they fire they should write some entry into an “event table”. Application B only needs to poll that “event table”. Advantage: Less traffic. Disadvantage: Logic is placed into the database in the form of triggers. (It’s not a question of the “evilness” of triggers. It’s a design question, which makes it a disadvantage.)
- Get rid of the polling approach and use SqlDependency class to get notified for changes. Advantage: (Maybe?) Less traffic than polling approach. Disadvantage: Not database independent. (I am aware of OracleDependency in ODP.NET, but what about the other databases?)
What approach is more favorable? Maybe I have missed some major (dis)advantage in the mentioned approaches? Maybe there are some other approaches I haven’t think of?
Edit 1: Database independency is a factor for the … let’s call them … ‘sales people’. I can use SqlDependency or OracleDependency. For DB2 or other databases I can fall back to the polling approach. It’s just a question of cost and benefit, which I want to at least to think about so I can discuss it.
I’d go with #1. It’s not actually as much traffic as you might think. If your data doesn’t change frequently, you can be pessimistic about it and only fetch something that gives you a yay or nay about table changes.
If you design your schema with polling in mind you may not really incur that much of a hit per poll.
If you’re only adding records, not changing them, then checking the highest id might be enough on a particular table.
If you’re updating them all then you can store a timestamp column and index it, then look for the maximum timestamp.
And you can send an ubber query that polls multiple talbes (efficiently) and returns the list of changed tables.
Nothing in this answer is particularly clever, I’m just trying to show that #1 may not be quite as bad as it at first seems.