Edit:
The reason I am looking if caching large data(entire database, or lots
of tables) possible is because database columns are encrypted,even for
different rows with different IV vectors of symetric rijndael key.
Thus SQL filtering is no option or indexing have no meaning. Also the application, actually it
is a framework for cloud and busisness applications, is being designed
as database independent as much as possible. You would suggest encrypt
only some columuns of a table only that holds really sensitive
information,such as email address or SSN, but that will make the
framework non-standard and you need to write new code for encrypted
columuns and for unencrypted columuns for each different application.
If there is no problem with caching, then I can do all operations on
object basis,dictionaries, linq etc. Ofcourse I have to sync database
and cache.
I plan to cache all or most database tables (encrypted) in memory.
I’m working on a cloud based application and it will have 100MB SQL Server / MySQL limit that will be shared by different clients. (So I can group them by client when caching; and even create smaller caching groups depend on the business model)
I have had no idea; how much time it would take to SELECT * FROM for 100000 rows, or 10 MB, or 20 MB of data etc. to fetch.
I made a quick search but could not find any benchmark that illustrates, “roughly” duration to retrieve mass of rows.
My company is using a business software that is common to use in most small-medium size companies in modern world. It is said to be have active records everyday, and have only 20MB of MySQL data in 4.5 years.
I checked in MySQL Administrator and see that the largest table is inventory_movements and have 7MB of data with 45000 rows.
I used MySQL Query Browser and execute to select all records from this table.
Software tool states that it took 0.4971 seconds. Now I think I have an idea.
Fetching all rows (only pure SELECT * FROM, no filters, joins) in C# .NET; from SQL Server database, of 7MB Data – 45000 rows would result in similar duration, right? I am still OK, if it is 2 or 3 seconds.
This way; at least I have an idea; if I cache 100MB data; it would probably take 5 to 30 seconds.
(Data will not be decrypted during fetch)(It will be later decrypted in RAM when it is required to be)(I am aware of I loose most of database features. Queries will be based on objects in the cache) (I’m just started to think while writing this comment; if I succeed; I can even use xml as a free database source, cause I’m designing OR/M like functions for this application)
My question is;
There is no any problem to cache 100MB of data, as soon as I have enough resource right?
In other words; it is not weird to cache 100 MB, or even 500 MB, 1 GB as soon as I have memory resource ?
Secondly; Do you think my time calculations for fetching records with SELECT are optimistic?
At application start; I can cache the data; and manage the modified/added/deleted data both in cache and database without making frequent re-load caching.
And you never will. The rate at which a database responds is dependent upon so many variables it would be impossible to answer that for somebody. What’s the tech specs of the servers? How many processors are you allowing the server to have? How did you index the table for reading?
As you can see, it can’t be answered by somebody outside the organization.
In short, before I begin, you’re looking at caching from the wrong perspective. Let’s think for a minute about the cache on the processor. What’s it used for? It’s used to ensure that frequent operations occur faster right? Well, that’s what data caching is used for – but that’s only one side of the coin.
Let’s talk about the second reason data caching exists. Let’s say that you have an application that performs upward of 3M+ operations daily. Seems like a lot, but realistic in Fortune 500 companies yes? Well, caching is then used to ensure that data access to frequently used data – even transaction driven data – has no bottleneck visualized by the user.
See, generally speaking, the bottleneck will not be the database engine, the processor, RAM, cache, or even the network. Generally speaking, the bottleneck is I/O. Well, to read/write to a database 3M+ times a day is too much to expect from even the largest and most capable SAN’s running 16K RPM drives.
So, what do we do, we spread the data across multiple machines (just in case one goes down and for load balancing) and we store it in RAM. Why? Because it’s the fastest I/O possible, simple.
So, I said all that, to say this, it’s likely that unless you’re performing millions of operations a day you’ll need to be caching 500MB or 1GB of data. In fact, it’s not clear from your question what exactly you’re trying to perform because there’s no “here is what my application does” in there, but it’s possible you don’t need caching at all.
Keep all of this in mind. Data caching is no trivial matter.