I’m using GoogleApp Engine and occasionally when I send a query to BigQuery via the JSON API, I will get incorrect results. It is usually only confined to a single table within BigQuery (I make a new table for every batch job that is created). When I run into this issue in production, I log the Query i submitted and try running it via the BigQuery dashboard which runs longer than expected but returns the expected results.
There is nothing in the response indicating an issue. the jobComplete comes back as True but I see no rows, just the jobReference, schema, and totalRows = 0.
In such situations is is appropriate to do a call to get the job results even though I should expect the current call to return the results?
Relevant Code:
http = httplib2.Http(memcache)
self.credentials = AppAssertionCredentials(scope='https://www.googleapis.com/auth/bigquery')
self.http = self.credentials.authorize(http=http)
self.service = build('bigquery','v2',http=self.http)
jobs = self.service.jobs()
result = jobs.query(projectId=settings.GOOGLE_APIS_PROJECT_ID,
body={'query': query}).execute()
Response:
{u'totalRows': u'0', u'kind': u'bigquery#queryResponse', u'jobComplete': True, u'jobReference': {u'projectId': u'<REMOVED>', u'jobId': u'<REMOVED>'}, u'schema': {u'fields': [<REMOVED>]}}
No matter how many times I try to re-run the query in production, the same results are returned (Could this be due to the caching done via memcache with incorrect results being cached as a response?)
The issue was a mix of the following:
I was doing multiple calls to BigQuery in parallel using a shared http object & taskqueues and the queries were taking over 10s to complete. This is why responses would get mixed between calls and the results would not be as expected. E.g. – I sometimes received the discovery response to my query request
The Fix:
Re-write my BigQuery client code to not share the httplib2 object between calls and de-couple my process to submit BigQuery jobs to run queries vs using the query() call. There is a lot more overhead in managing the calls and checking on statuses and receiving results, but at least it works now and the responses make sense.