I want to get a dataset of Android application information, which includes application name, package name, version, permission requested, etc.
The official Android application market is Google Play. There are millions of applications on the market. I want to get at least tens of thousands of application information from it, and store it into a csv file. For instance, here’s an application’s link:
https://play.google.com/store/apps/details?id=de.ralphsapps.snorecontrol
- The problem is how to get the list of applications’ url?
- How to to parse the information from webpage?
Is there any good web crawler suitable for this kind of job? Or is there any scripting language, such as python, has such kind of crawl functions?
Thanks.
Google Play Storehas its own format to display information objects in HTML.Write your own HTML parser for all these to get information you need.
It’s best to use JSoup for this job at JSoup.org ,
or refer to my sample tutorial on
JSoupas a parser:Parsing HTML using JSoup