I wanted your advice for the best design approach at the following Python project.
I am building a web service system that is split into 2 parts:
- This part grabs realtime data from a 3rd party API and puts the data in a DB.
- This part exposes a json API to access data from the DB mentioned in 1).
Some background info – 2) runs on django, and exposes the API via view methods. It uses SQLAlchemy instead of the django ORM.
My questions are:
– Should 1) and 2) run on the same machine, considering that they both access the same MySQL DB?
– What should 1) run on? I was thinking about just running cron jobs with Python scripts that also use SQLAlchemy. This is because I don’t see a need for an entire web framework here, especially because this needs to work super fast. Is this the best approach?
– Data size – 1) fetches about 60,000 entries and puts them in the DB every 1 minute (an entry contains of about 12 Float values and a few Dates and Integers). What is the best way to deal with the ever growing amount of data here? Would you split the DB? If so, into what?
Thanks!
I would say, run the two on the same machien to start with, and see how the performance goes. Why spend money on a second machine if you don’t have to?
As for “dealing with the ever growing amount of data”—do you need to keep old data around? If not, your second task can simply delete old data when it’s done with it. Provided all the records are properly time-stamped, you don’t need to worry about race conditions between the two tasks.