I have an XML file and I need to extract
testname
from all the instances of
<con:testSuite name="testname"
within the XML file.
I am not quite sure how to approach this, or whether this is possible in batch.
Here is what I have thought so far:
1) Use FINDSTR and store every line that has
<con:testSuite name=
in a variable or a temporary file, like this:
FINDSTR /C:"<con:testSuite name=" file.xml > tests.txt
2) Somehow use that file or variable to extract the strings
Note that there might be more than one instance of the matching string in the same line.
I am a novice at batch and any help is appreciated.
Parsing XML is very painful with batch. Batch is not a good text processor to begin with. However, with some amount of effort you can usually extract the data you want from a given XML file. But the input file could easily be rearranged into an equivalent valid XML form that will break your parser.
With that disclaimer out of the way…
Here is a native batch solution
The FINDSTR
/Noption is only there to guarantee that no line begins with a;so that we don’t have to worry about the pesky default FOR “EOL” option.The toggling of delayed expansion on and off is to protect any
!characters that may be in the input file. If you know that!never appears in the input, then you can simplysetlocal enableDelayedExpansionat the top and remove all othersetlocalandendlocalcommands.The last FOR /F uses special escape sequences to enable the specification of a double quote as a DELIM character.
Answer to additional question in comment
You cannot simply put the additional constraint in the existing FINDSTR command because it will return the entire line that has a match. Remember you said yourself, “there might be more than one instance of the matching string in the same line”. The first name might start with the correct prefix, and the 2nd name on the same line might not. You only want to keep the one that starts appropriately.
One solution is to simply change the
echo(%%B >>%output%line as follows:The FINDSTR is using a regular expression meta-character
^to specify that the string must start withlp_. The quotes have already been removed at this point, so we don’t have to worry about them.However, you may run into a situation in the future where you must include
"in your search string. Plus it might be marginally faster to include thelp_screen in the initial FINDSTR so that:parseLineis not called unnecessarily.FINDSTR requires that search string double quotes are escaped with a back slash. But the Windows CMD processor also has its own rules for escaping. Special characters like
>need to be either quoted or escaped. The original code used quotes, but you want to include a quote in the string, and that creates unbalanced quotes in your command. Windows batch generally likes quotes in pairs. At least one of the quotes must be escaped for CMD as^". If the quote needs to be escaped for both CMD and FINDSTR, then it looks like\^".But any special characters within the string that are no longer functionally quoted from a CMD perspective must be escaped using
^as well.Here is one solution that escapes all special characters. It looks awful and is very confusing.
Here is another solution that looks much better, but it is still confusing to keep track of what is escaped for CMD and what is escaped for FINDSTR.
One way to keep things a bit simpler is to convert the search into a regular expression. A single double quote can be searched using
[\"\"]. It is a character class expression that matches either a quote or a quote – silly I know. But it keeps quotes paired so that CMD is happy. Now you don’t have to worry about escaping any characters for CMD, and you can concentrate on the regex search string.