I have to parse a document containing groups of variable-value-pairs which is serialized to

Question

0

Asked: May 24, 20262026-05-24T14:06:49+00:00 2026-05-24T14:06:49+00:00

I have to parse a document containing groups of variable-value-pairs which is serialized to

0

I have to parse a document containing groups of variable-value-pairs which is serialized to a string e.g. like this:

4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^

Here are the different elements:

Group IDs:

4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Length of string representation of each group:

4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
One of the groups:

4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14 ^VAR1^6^VALUE1^^
Variables:

4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Length of string representation of the values:

4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
The values themselves:

4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^

Variables consist only of alphanumeric characters.
No assumption is made about the values, i.e. they may contain any character, including ^.

Is there a name for this kind of grammar? Is there a parsing library that can handle this mess?

So far I am using my own parser, but due to the fact that I need to detect and handle corrupt serializations the code looks rather messy, thus my question for a parser library that could lift the burden.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T14:06:49+00:00

The simplest way to approach it is to note that there are two nested levels that work the same way. The pattern is extremely simple:

id^length^content^

At the outer level, this produces a set of groups. Within each group, the content follows exactly the same pattern, only here the id is the variable name, and the content is the variable value.

So you only need to write that logic once and you can use it to parse both levels. Just write a function that breaks a string up into a list of id/content pairs. Call it once to get the groups, and then loop through them calling it again for each content to get the variables in that group.

Breaking it down into these steps, first we need a way to get “tokens” from the string. This function returns an object with three methods, to find out if we’re at “end of file”, and to grab the next delimited or counted substring:

var tokens = function(str) {
    var pos = 0;
    return {
        eof: function() {
            return pos == str.length;
        },
        delimited: function(d) {
            var end = str.indexOf(d, pos);
            if (end == -1) {
                throw new Error('Expected delimiter');
            }
            var result = str.substr(pos, end - pos);
            pos = end + d.length;
            return result;
        },
        counted: function(c) {
            var result = str.substr(pos, c);
            pos += c;
            return result;
        }
    };
};

Now we can conveniently write the reusable parse function:

var parse = function(str) {
    var parts = {};
    var t = tokens(str);
    while (!t.eof()) {
        var id = t.delimited('^');
        var len = t.delimited('^');
        var content = t.counted(parseInt(len, 10));
        var end = t.counted(1);
        if (end !== '^') {
            throw new Error('Expected ^ after counted string, instead found: ' + end);
        }
        parts[id] = content;
    }
    return parts;
};

It builds an object where the keys are the IDs (or variable names). I’m asuming as they have names that the order isn’t significant.

Then we can use that at both levels to create the function to do the whole job:

var parseGroups = function(str) {
   var groups = parse(str);
   Object.keys(groups).forEach(function(id) {
     groups[id] = parse(groups[id]);
   });
   return groups;
}

For your example, it produces this object:

{
  '1': { 
    VAR1: 'VALUE1' 
  },
  '4': {
    VAR1: 'VALUE1',
    VAR2: 'VAL2'
  } 
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have to parse a document containing groups of variable-value-pairs which is serialized to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply