I’ve been given a task to build a prototype for an app. I don’t have any code yet, as the solution concepts that I’ve come up with seem stinky at best…
The problem:
the solution consist of various Azure projects which do stuff to lots of data stored in Azure SQL db-s. Almost every action that happens creates a gzipped log file in blob storage. So that’s one .gz file per log entry.
We should also have a small desktop (WPF) app that should be able to read, filter and sort these log files.
I have absolutely 0 influence on how the logging is done, so this is something that can not be changed to solve this problem.
Possible solutions that I’ve come up with (conceptually):
1:
- connect to the blob storage
- open the container
- read/download blobs (with applied filter)
- decompress the .gz files
- read and display
The problem with this is that, depending on the filter, this could mean a whole lot of data to download (which is slow), and process (which will also not be very snappy). I really can’t see this as a usable application.
2:
- create a web role which will run a WCF or REST service
- the service will take the filter params and other stuff and return a single xml/json file with the data, the processing will be done on the cloud
With this approach, will I run into problems with decompressing these files if there’s a lot of them (will it take up extra space on the storage/compute instance where the service is running).
EDIT: what I mean by filter is limit the results by date and severity (info, warning, error). The .gz files are saved in a structure that makes this quite easy, and I will not be filtering by looking into the files themselves.
3:
- some other elegant and simple solution that I don’t know of
I’d also need some way of making the app update the displayed logs in real time, which i suppose would need to be done with repeated requests to the blob storage/service.
This is not one of those “give me code” questions. I am looking for advice on best practices, or similar solutions that worked for similar problems. I also know this could be one of those “no one right answer” questions, as people have different approaches to problems, but I have some time to build a prototype, so I will be trying out different things, and I will select the right answer, which will be the one that showed a solution that worked, or the one that steered me in the right direction, even if it does take some time before I actually build something and test it out.
As I understand it, you have a set of log file in Azure Blob storage that are formatted in a particular way (gzip) and you want to display them.
How big are these files? Are you displaying every single piece of information in the log file?
Assuming that if this is a log file, it is static and historical…meaning that once the log/gzip file is created it cannot be changed (you are not updating the gzip file once it is out on Blog storage). Only new files can be created…
One Solution
Why not create an worker role/job process that periodically goes out and scans the blob storage and builds a persisted “database” so that you can display. Nice thing about this is that you are not putting the unzipping/business logic to extract the log file in a WPF app or UI.
1) I would have the worker role scan the log file in Azure Blob storage
2) Have some kind of mechanism to track which ones where processed and a current “state” maybe the UTC date of the last gzip file
3) Do all the unzipping/extracting of the log file in the worker role
4) Have the worker role place the content in a SQL database, Azure Table Storage or Distributed Cache for access
5) Access can be done by a REST service (ASP.NET Web API/Node.js etc)
You can add more things if you need to scale this out, for example run this as a job to re-do all of the log files from a given time (refresh all). I don’t know the size of your data so I am not sure if that is feasable.
Nice thing about this is that if you need to scale your job (overnight), you can spin up 2, 3, 6 worker roles…extract the content, pass the result to a Service Bus or Storage Queue that would insert into SQL, Cache etc for access.