Essentially I have a large database of transactions and I am writing a script that will take some personal information and match a person to all of their past transactions.
So I feed the script a name and it returns all of the transactions that it has decided belong to that customer.
The issue is that I have to do this for almost 30k people and the database has over 6 million transaction records.
Running this on one computer would obviously take a long time, I am willing to admit that the code could be optimized but I do not have time for that and I instead want to split the work over several computers. Enter Celery:
My understanding of celery is that I will have a boss computer sending names to a worker computer which runs the script and puts the customer id in a column for each transaction it matches.
Would there be a problem with multiple worker computers searching and writing to the same database?
Also, have I missed anything and/or is this totally the wrong approach?
Thanks for the help.
No, there wouldn’t be any problem multiple worker computers searching and writing to the same database since MySQL is designed to be able to handle this. Your approach is good.