I have a URL:
http://somewhere.com/relatedqueries?limit=2&query=seedterm
where modifying the inputs, limit and query, will generate wanted data. Limit is the max number of term possible and query is the seed term.
The URL provides text result formatted in this way:
oo.visualization.Query.setResponse({version:’0.5′,reqId:’0′,status:’ok’,sig:’1303596067112929220′,table:{cols:[{id:’score’,label:’Score’,type:’number’,pattern:’#,##0.###’},{id:’query’,label:’Query’,type:’string’,pattern:”}],rows:[{c:[{v:0.9894380670262618,f:’0.99′},{v:’newterm1′}]},{c:[{v:0.9894380670262618,f:’0.99′},{v:’newterm2′}]}],p:{‘totalResultsCount’:’7727′}}});
I’d like to write a python script that takes two arguments (limit number and the query seed), go fetch the data online, parse the result and return a list with the new terms [‘newterm1′,’newterm2’] in this case.
I’d love some help, especially with the URL fetching since I have never done this before.
It sounds like you can break this problem up into several subproblems.
Subproblems
There are a handful of problems that need to be solved before composing the completed script:
Forming the request URL
This is just simple string formatting.
Retrieving data
You can use the built-in urllib.request module for this.
This returns a file-like object called
data. You can also use a with-statement here:Unwrapping JSONP
The result you pasted looks like JSONP. Given that the wrapping function that is called (
oo.visualization.Query.setResponse) doesn’t change, we can simply strip this method call out.Parsing JSON
The resulting
resultstring is just JSON data. Parse it with the built-in json module.Traversing the object graph
Now, you have a
result_objectthat represents the JSON response. The object itself be adictwith keys likeversion,reqId, and so on. Based on your question, here is what you would need to do to create your list.Putting it all together
Python 2.7 version