I have a spreadsheet, approximately 1500 rows x 1500 columns. The labels along the top and side are the same, and the data in the cells is a quantified similarity score for the two inputs. I’d like to make a Rails app allowing a user to enter the row and column values and retrieve the similarity score. The similarity scores were derived empirically, and can’t be mathematically produced by the controller.
Some considerations: with every cell full, over half of the data is redundant; e.g., (row 34, column 985) holds the same value as (row 985, column 34). And row x will always be perfectly similar to column x. The data is static, and won’t change for years.
Can this be done with one db table? Is there a better way? Can I skip the relational db entirely and somehow query the file directly?
All assistance and advice is much appreciated!
Database is always a safe place to store it. Relational Database is straightforward and a good idea. However there are alternatives to consider. How often will this data be accessed? Is it accessed rarely or very frequently? If it’s accessed very rarely, just put it in the database and let your code take care of searching and presenting. You’ll optimize it by database indexes etc.
Flat-File is a good idea but reading and searching it at run time for every request is going to be too slow.
You could read all the data (from db/file) at server startup, and keep it in memory and ensure that your servers dont restart too often. It means each one of your servers is going to sit with the entire grid in memory but computation is going to be really fast. If you use REE and calibrate the Garbage Collection settings, you can also minimize the startup time of the server to a large extent.
Here’s my final suggestion. Just build your app in the simplest way you know. Once you know how often and how much your app is going to be used, you start optimizing. You are fundamentally working with 1125000 cells. This is not unreasonably large dataset for a database to process. But since your dataset will not change, you can go far by conventional caching techniques.