I’m currently designing and developing a web application that has the potential to grow very large at a fast rate. I will give some general information and move on to my question(s). I would say I am a mid-level web programmer.
Here are some specifications:
MySQL – Database Backend
PHP – Used in front/backend. Also used for SOAP Client
HTML, CSS, JS, jQuery – Front end widgets (highcharts, datatables, jquery-ui, etc.)
I can’t get into too many fine details as it is a company project, but the main objective is to construct a dashboard that thousands of users will be accessing from various devices.
The data for this project is projected to grow by 50,000 items per year ( ~1000 items per week ).
1 item = 1 row in database
An item will also record a daily history starting at the day it was inserting.
1 day of history per item = 1 record
365 records per 1 year per device
365 * 50,000 = ~18,500,000 [first year]
multiply ~18,500,000 records by x for each year after.
(My forumla is a bit off since items will be added periodically throughout that year)
All items and history are accessed through a SOAP Client that connects to an API service, then writes the record to the database.
Majority of this data will be read and remain static (read only). But some item data may be updated or changed. The data will also be updated each day and need to write another x amount of history.
Questions:
1) Is MySQL a good solution to handle these data requirements? ~100 million records at some point.
2) I am limited to synchronous calls with my PHP Soap Client (as far as I know). This is becoming time consuming as more items are being extracted. Is there a better option for writing a SOAP Client so that I can send asynchronous requests without waiting for a response?
3) Are there any other requirements I should be thinking about?
1) Is MySQL a good solution to handle these data requirements? ~100 million records at some point.
Absolutely. Make sure you’ve got everything indexed properly, and if you hit a storage or query-per-second limit, you’ve got plenty of options that apply to most/all DBMS’s. You can get beefier hardware, start sharding data across servers, clustering, etc..
2) I am limited to synchronous calls with my PHP Soap Client (as far as I know). This is becoming time consuming as more items are being extracted. Is there a better option for writing a SOAP Client so that I can send asynchronous requests without waiting for a response?
PHP 5+ allows you to execute multiple requests in parallel with CURL. Refer to the curl_muli* function for this, such as curl_multi_exec(). As far as I know, this requires you to handle SOAP/XML processing disjointly from the requests.
3) Are there any other requirements I should be thinking about?
Probably. But, you’re usually on the right track if you start with a properly indexed, normalized database, for which you’ve thought about your objects at least mostly correctly. Start denormalizing if/when you find instances wherein denormalization solves an existing or obvious near-future efficiency problem. But, don’t optimize for things that could become problems if the moons of Saturn align. Only optimize for problems that users will notice somewhat regularly.