I’m starting to build a record keeping database for the documents we manage on our system. Each document goes through a bunch of specific processing tasks that I will call here normalization, conversion and extraction.
The document processing may fail at any of these steps, so, I’m looking for a solution where i can quickly store this information for archiving but I should also be able to query the information (and possibly summarize it). If I would define my data structure in json it would possibly look like this:
{ 10123 : [
{ queue : 'converter',
startedAt : 'date-here',
finishedAt: 'date-here',
error : { message : 'error message', stackTrace : 'stack trace here' },
machine : '192.168.0.1'
} ,
{ queue : 'extractor',
startedAt : 'date-here',
finishedAt: 'date-here',
error : { message : 'error message', stackTrace : 'stack trace here' },
machine : '192.168.0.1'
},
{ queue : 'extractor',
startedAt : 'date-here',
finishedAt: 'date-here',
error : { message : 'error message', stackTrace : 'stack trace here' },
machine : '192.168.0.1'
},
] }
In an ideal world I would have the full processing life information from a specific document and should also be able to detect wich ones have failed and the average time each process takes.
Any hints on an ideal database solution to handle this? This would possibly go for a couple of thousands writes a day.
The main solution is written in Java, so the DB should have a Java driver.
Mongodb is a right choice for this since it supports all your expected features out of the box
check out mongodb use cases for more info