I have a data processing system that generates very large reports on the data it processes. By “large” I mean that a “small” execution of this system produces about 30 MB of reporting data when dumped into a CSV file and a large dataset is about 130-150 MB (I’m sure someone out there has a bigger idea of “large” but that’s not the point… 😉
Excel has the ideal interface for the report consumers in the form of its Data Lists: users can filter and segment the data on-the-fly to see the specific details that they are interested in (because they’re not really interested in the many thousands of rows, they know how to apply multiple filters to get the data they want) – they can also add notes and markup to the reports, create charts, graphs, etc… They know how to do all this and it’s much easier to let them do it if we just give them the data.
Excel was great for the small test datasets, but it cannot handle these large ones. Does anyone know of a tool that can provide a similar interface as Excel data lists – the ability to dynamically create and change filters on multiple fields; but can handle much larger files?
The next tool I tried was MS Access, and found that the Access file bloats hugely (30 MB input file leads to about 70 MB Access file, and when I open the file, run a report and close it the file’s at 120-150 MB!), the import process is slow and very manual (currently, the CSV files are created by the same plsql script that runs the main process so there’s next to no intervention on my part). I also tried an Access database with linked tables to the database tables that store the report data and that was many times slower (for some reason, sqlplus could query and generate the report file in a minute or soe while Access would take anywhere from 2-5 minutes for the same data)
(If it helps, the data processing system is written in PL/SQL and runs on Oracle 10g.)
Access would be a good tool to use in this case as it has no practical row limit unlike excel. The hard part is weaning people off excel when they are used to the power of custom filters. It is very possible in access to get something that approximates this but its never going to be exactly the same unless you embed an excel control into your forms.
As for the manual part, you can script the database to import files in using VBA. For example lets say this main task of your dumps the files in overnight to a folder with a new file each night. You could make a “watchdog” access database that has a form open with an “OnTimer” event that looks at that folder every few minutes, when it finds a new file it starts the import. When your users get to work in the morning the data is already loaded.
As for the bloating, yes it can be a problem however all you need to do is a quick compact and repair on the file and it will shrink it down.
EDIT:
You can set an access db to be compacted on close through the options. I cant remember exactly where it is and at work we only have access 97 (but oddly enough office 2003). The other option is to compact through code. Here is a link to explain how
http://forums.devarticles.com/microsoft-access-development-49/compact-database-via-vba-24958.html