I have run into a situation where there is ‘bad’ data in a number of tables. Data has been cross contaminated from various sources and I need to clean it out.
Specifically there are several hundred tables with identical definitions. They hold timed sensor data with an auto-increment column, Time/Date stamp and other data. The ‘bad’ data can be identified by time/date jumping backwards rather than growing as expected.
Example:
10 2010/01/05
11 2010/01/06
12 2010/01/07
13 2008/05/09
14 2008/05/10
15 2008/05/11
16 2010/01/08
17 2010/01/09
Im looking for the best way to find these areas.
Some things to note:
– the tables in question have 100s of millions of records
– in my example the dates are sequential – in reality there may be 10 or 1000 entries for a given date (with timestamps on each) and then nothing for a week.
I can imagine a perl script walking through each and looking for these jumps. Im wondering if there is a faster, more sql-esque method.
This is the fastest way I can think of.
NOTE: I’m assuming you’re expecting to get records with IDs
13, 14, 15in your example.