we just noticed around 09/27/2012 our data have been duplicated from doing csv files upload (using Java API). Logs indicated no error during upload but we have confirmed a majority of rows during that day have been duplicated (there is distinct timestamp in micro second per row) Is there any known glitches during that day? We’re at a loss of how to prevent this from happening again.
Thanks for any feed back.
Thanks for looking into this for us. It is hard (almost impossible) to believe that data got duplicated on the bigquery side. That said nothing we can see seems to indicate otherwise. As mentioned we have a microsecond timestamp value on every row. For the two job IDs referenced I picked a row at random and made sure that within all of the data we’ve ever imported it was a unique value. When I run the same query I get two (identical) rows in our bigquery table.