I am working on a Website which is displaying all the apps from the App Store. I am getting AppStore data by their EPF Data Feeds through EPF Importer. In that database I get the pricing of each App for every store. There are dozen of rows in that set of data whose table structure is like:
application_price
The retail price of an application.
Name Key Description
export_date The date this application was exported, in milliseconds since the UNIX Epoch.
application_id Y Foreign key to the application table.
retail_price Retail price of the application, or null if the application is not available.
currency_code The ISO3A currency code.
storefront_id Y Foreign key to the storefront table.
This is the table I get now my problem is that I am not getting any way out that how I can calculate the price reduction of apps and the new free apps from this particular dataset. Can any one have idea how can I calculate it?
Any idea or answer will be highly appreciated.
I tried to store previous data and the current data and then tried to match it. Problem is the table is itself too large and comparing is causing JOIN operation which makes the query execution time to more than a hour which I cannot afford. there are approx 60, 000, 000 rows in the table
With these fields you can’t directly determine price drops or new application. You’ll have to insert these in your own database, and determine the differences from there. In a relational database like MySQL this isn’t too complex:
To determine which applications are new, you can add your own column “first_seen”, and then query your database to show all objects where the
first_seencolumn is no longer then a day away.To calculate price drops you’ll have to calculate the difference between the retail_price of the current import, and the previous import.
Since you’ve edited your question, my edited answer:
It seems like you’re having storage/performance issues, and you know what you want to achieve. To solve this you’ll have to start measuring and debugging: with datasets this large you’ll have to make sure you have the correct indexes. Profiling your queries should helping in finding out if they do.
And probably, your environment is “write once a day”, and read “many times a minute”. (I’m guessing you’re creating a website). So you could speed up the frontend by processing the differences (price drops and new application) on import, rather than when displaying on the website.
If you still are unable to solve this, I suggest you open a more specific question, detailing your DBMS, queries, etc, so the real database administrators will be able to help you. 60 million rows are a lot, but with the correct indexes it should be no real trouble for a normal database system.