Currently we are saving all SOAP requests to a log file (rolling, one per day). When I get a request about who did what, when, how often etc – then I need to resort to heavy duty text manipulation in order to extract that information. My idea was to save those SOAP requests to a database where I could perform some queries.
A SOAP request consists of a header and an envelope. Within the header is the username of the one invoking the action. Within the body of the envelope, the first element is the WS method name, followed by all the parameters to the method. Each WS method obviously has different parameters:
- SOAP request
- Envelope
- Header
- Username = XXX
- Body
- WS name
- WS Parameters (different between methods, hierarchical)
- WS name
- Header
- Envelope
I’d like to construct queries like these without too much difficulties:
- Which WS methods has user XXX invoked within certain period
- How often was a each WS method called
- Show me all requests made to WS XXX where a particular parameter to the WS was YYY (e.g. every call the “changeParty” where the partyId = 123)
Is this something that would be ideally suited to a “Document store type of database” / NoSQL, such as CouchDB/MongoDB? Would this be hard to set up? Any alternative suggestions / recommendations / thoughts?
Thanks!
EDIT:
- In most cases I think I’d be fine with an index on: user, WS method, requestDate. Any other field would probably not be indexed (although I would do some filtering on them). As it is for auditing purposes only, it wouldn’t have to be lightning fast.
- I was thinking about putting NoSql server on the monster app-servers that we have and give the “real underlying” database (DB2) a break from any kind of logging as it is in pretty heavy use.
Well, specifically with regards to Couch and Mongo, the first issue you will have to deal with is converting the SOAP XML payload in some relevant JSON payload, since that’s what these DBs understand.
Next, most of these DBs require you to have a good understanding of what queries you want to make from them up front, during initial design, and ideally before you start loading data in to them. Adding new indexes can be expensive for existing data sets, particularly large ones.
It’s all very doable, but you want to get out of your head any concept of an “ad hoc” query like in an RDBMS. They simply don’t typically perform well in NoSQL systems.
Finally, since you have to convert the data from XML to JSON already, you may well be better off extracting the “interesting fields” from your SOAP requests, the ones you want to query on, and simply populate your own index fields and rows and store the data in your (I assume) already existing SQL database.
Edit for comment:
It’s not transparent to you, the users, because the burden of conversion to JSON falls upon you, since the two DBs mentioned require a JSON document to be passed to them.
That said, you could simply extract the few bits you care about rather than marshall the entire document. You would have to marshall everything you’re interested in, including for filtering, in that case.
So, you can create a document:
Then you could build views or indexes off of anything exposed within the JSON document. You won’t be able to (easily) query anything from within the raw SOAP request. Only elements you specifically extract and put in to the JSON payload will you be able to access readily through the DB.
My basic point is, tho, that if you have to go through any marshalling process to convert your data and make it ready for the database, then you likely may as well use a database you already know and is supported at your company (i.e. likely your existing RDBMS) rather than introduce another brick in to your stack by adding a new DB.
If you were already native JSON, or, say, found a NoSQL-esque DB that supported native XML vs JSON, then that might be worthwhile since you can skip that part of the process.
But you can easily create a table (or tables) to handle this data in most any modern RDBMS.