This is a fairly abstract question, I hope it is within bounds.
I’m about 5 months into my coding career in web development. I’ve found that there’s often a tension between CPU and storage resources. Put simply, you can use less of one and more of the other, or vice versa (then throw in the speed consideration). I’m now getting to the point of deploying my first app for production, so this balance is now a matter of real dollars and cents. The thing is this: I really don’t have any idea what kind of balance I should be looking for.
Here’s some salient examples that might illuminate the balance to be struck in different case scenarios.
Background
I am working on an app that does alot of diffs between text. Users will call on pages that contain diffs displayed in html. A lot.
First Case
Should I run a diff each time a page is displayed, or should I run the diff once, store it, and call it each time a page is displayed?
Second Case
I have coded up an algorithm that summarises diffs. It’s about 110 lines of code, and it uses 4 or 5 loops and subloops. Again, should I run this once and store the results, so that they can be called on later, or should I just run the algorithm each time a page is displayed?
Would also love to hear your views on the best tools to use to quantify the balance.
Difficult to answer without testing it out but you might want to answer these questions:
1) How expensive is the diff operation? Run a test or compute the complexity. If diff operation is on really large files or rapidly changing files, you might want to modify the algorithm. Storing diffs doesn’t seem like a great solution if the files are large, change little or change rapidly over time.
2) How many times would you need to generate the same diff with the same files and is there a time bound associated with this?
– If the same diff is generated over and over again in a short span of time, you might want to cache it and not write it to a database. If the diff is accessed sporadically over time (Few days, months), you might want to store it that is after analyzing 1 above.
You might benchmark using costs on Amazon Web Services. Again you have choices there. You could just use a single EC2 instance for everything or split the workflow against RDS, EC2 and S3 and then analyze the cost. Depends on what level of scale you desire.